GenDBWiki/AdministratorDocumentation/GenDBInstallation
GenDB Installation Instructions
Below you can find some basic installation instructions for installing GenDB from a tarball. There's also a [/GenDBInstallationFAQ FAQ page] describing some of the problems that may occur during installation.
/!\ Be aware that the current version of GenDB 2.2 is a release candicat. /!\
Although we are testing our software on a number of systems prior to release, it may contain error and does not work in the way you expect it. GenDB 2.2 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE !
- 1: Checking system requirements
As a Perl application, GenDB 2.2 requires a recent version of Perl and a number of additional (non bioinfomatics related) tools:
- Perl (version 5.8 or higher)
- MySQL (version 4.0 or higher)
- Sun Grid Engine (version 6.0 or higher) or another DRMAA compatible scheduling system
- GNU plot
- NetPBM
- ImageMagick
- GraphViz ([http://www.graphviz.org)
- wget
- dialog
- Apache ([1])
- mod_perl (optional, but "highly recommended" [2])
You also need an administrativ account at the MySQL server for creating several databases during the installation.
- 2: Installing the required Perl modules
GenDB 2.2 relies on a number of freely available Perl modules. These modules are available at the CPAN archive ([3]). Your need to install these modules prior to installing GenDB 2.2.
Depending on the kind of operating system these modules may also be available as package provided by your operating system vendors; using these packages is recommended.
List of Perl module (in no particular order):
- Graph
- GD
- Config::IniFiles
- LWP::UserAgent
- HTTP::Request
- URI
- Chart::Graph::Gnuplot
- Crypt::GeneratePassword
- Mail::Mailer
- DBI
- DBD::mysql
- Term::ReadKey
- DB_File::Lock
- HTML::Template
- CGI
- Digest::MD5
- UI::Dialog
- BioPerl (also available at [4])
Please keep in mind that these modules may have further requirements and may depend on further modules or libraries. If you are about to install the modules by yourself instead of using ready-made packages, using the so called "CPAN Shell" will help you resolving dependencies and installing modules. See the section about installing perl modules at the CPAN website ([5]) or the manpage of the CPAN module (if installed). Newer versions of Perl (> 5.8) also provide an improved version of the CPAN Shell, called "CPANPlus".
- 3: Installing bioinformatics software
GenDB 2.2 does not contain methods for gene calling or predicting gene functions; it uses a number of mostly freely available, well-known tools. Most of these tools are required for running GenDB 2.2.
Gene calling:
- Critica
- Glimmer 2
- tRNAScan-SE
- SearchForRNAs (available at our ftp server)
- QRna (optional)
- rbsfinder (optional)
Function prediction:
- NCBI Blast ([6])
- HMMER ([7])
- InterProScan ([httt://www.ebi.ac.uk])
- EMBOSS ([8])
- SAPS (optional)
- TMHMM (optional)
- SignalP (optional)
At least the mandatory tools need to be setup and configured properly.
- 4: Getting the required sequence databases
The tools described above operate on various databases containing biological sequences, pattern, HMMs etc. You need at least the following databases for GenDB 2,2:
- a non redundant nucleotide database (usually called "nt", available by ftp at the NCBI and the EBI)
- a non redundant protein database ("nr", also available at the NCBI/EBI)
- a blastable SwissProt database
You may either use a standalone database or a subset of the "nr" database provided by the NCBI or the EBI. Setting up this subset is beyond the scope of this documentation.
- a database containing the protein sequences of all genomes available in the KeGG database
You can build this database by downloading the sequences from the KeGG ftp server ([9]) and concatenating the necessary files.
You have to ensure that the fasta header line format follows the defline format required by the NCBI. See the documentation to "formatdb" and "fastacmd" and [10]. If you want to use custom databases within GenDB, the header lines also have to follow the defline standard.
- 5: Getting the GenDB 2.2 tarball ([11])
- 6: Unpacking the tarball in a temporary directory
- 7: Run the installation script ("sh install_gendb.sh <target directory>")
The installation script will check for the Perl modules, pathes to binaries and databases, install the components and setup the system.
Note:
- The GOPArc components downloads several databases from external servers, like the COG and KOG databases, KeGG databases (others than the database mentioned above). You need a internet connection during the installation. The installation script will ask for a proxy server to be used for downloading. If you do not need a proxy leave the field empty. Otherwise enter the complete proxy URL, e.g. http://proxy.my-institute.org:1234.
- During the download the script also tries to fetch the KeGG pathway reference maps. These are about 2.300 single files, which unfortunatly have to be fetched one by one.
- Some of the installation scripts produce a lot of output with warnings and error messages. We are aware of this and will fix it in the next release.
- 8: Setting up the web server
The apache web server is usually available as package provided by the operating system vendors; however the way to configure the server differs between operating systems and distributions. We have provided a script called "setup_web_interface" that tries to detect the apache version and is able to create a configuration fragment you can use to setup the web server. The fragment contains the necessary configuration directives, but may need to be adopted to your local installation.
- 9: Start a dispatcher
The "dispatcher" is the part of GenDB 2.2 responsible for managing external tools, running them on the cluster, parsing output and so on. It need to be started prior to submitting tools within the GenDB 2.2 web interface or by a command line script. You may start the dispatcher at system boot time, using a so called "init script". Again these scripts differ between operating systems and distributions, so we are not able to provide a ready-made one. Ask your local system administrator for more information about how to write an init script.
The installation is finished now and you can start with adding a new GenDB project and processing the first data. See the core scripts page for more information about this.