Latest revision as of 15:40, 31 October 2011

GenDB Installation Instructions

Below you can find some basic installation instructions for installing GenDB from a tarball. There's also a FAQ page describing some of the problems that may occur during installation.

Although we are testing our software on a number of systems prior to release, it may contain errors and in the worst case it does not work in the way you expect it. GenDB 2.2 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE!

Checking system requirements

As a Perl application, GenDB 2.2 requires a recent version of Perl and a number of additional (non bioinformatics related) tools:

Perl (version 5.8 or higher)
MySQL (version 4.0 or higher)
Sun Grid Engine (version 6.0 or higher) or another DRMAA compatible scheduling system
GNU plot
NetPBM
ImageMagick
GraphViz ([http://www.graphviz.org)
wget
dialog
Apache ([1])
mod_perl (optional, but "highly recommended" [2])

You also need an administrative account at the MySQL server for creating several databases during the installation.

Installing the required Perl modules

GenDB 2.2 relies on a number of freely available Perl modules. These modules are available at the CPAN archive ([3]). You need to install these modules prior to installing GenDB 2.2. Depending on the kind of operating system these modules may also be available as packages provided by your operating system vendors; using these packages is recommended. List of required Perl modules (in no particular order):

Graph
GD
Config::IniFiles
LWP::UserAgent
HTTP::Request
URI
Chart::Graph::Gnuplot
Crypt::GeneratePassword
Mail::Mailer
DBI
DBD::mysql
Term::ReadKey
DB_File::Lock
HTML::Template
CGI
Digest::MD5
UI::Dialog
BioPerl (also available at [4])

Please keep in mind that these modules may have further requirements and may depend on further modules or libraries. If you are going to install the modules by yourself instead of using ready-made packages, using the so called "CPAN Shell" will help you resolving dependencies and installing modules. See the section about installing Perl modules at the CPAN website or the manpage of the CPAN module (if installed). Newer versions of Perl (> 5.8) also provide an improved version of the CPAN Shell, called "CPANPlus".

Installing bioinformatics software

GenDB 2.2 does not contain methods for gene calling or predicting gene functions; it uses a number of mostly freely available, well-known tools. Most of these tools are required for running GenDB 2.2.

Gene calling:
- Critica
- Glimmer 2
- tRNAScan-SE
- SearchForRNAs (available at our ftp server)
- QRNA (optional)
- rbsfinder (optional)
Function prediction:
- NCBI Blast ([5])
- HMMER ([6])
- InterProScan ([7])
- EMBOSS ([8])
- SAPS (optional)
- TMHMM (optional)
- SignalP (optional)

At least the mandatory tools need to be setup and configured properly.

Getting the required sequence databases

The tools described above operate on various databases containing biological sequences, pattern, HMMs etc. You need at least the following databases for GenDB 2.2:

a non redundant nucleotide database (usually called "nt", available by ftp at the NCBI and the EBI)
a non redundant protein database ("nr", also available at the NCBI/EBI)
a blastable SwissProt database
You may either use a standalone database or a subset of the "nr" database provided by the NCBI or the EBI. Setting up this subset is beyond the scope of this documentation.
a database containing the protein sequences of all genomes available in the KeGG database
You can build this database by downloading the sequences from the KeGG ftp server and concatenating the necessary files.

You have to ensure that the FASTA header line format follows the defline format required by the NCBI. See the documentation to "formatdb" and "fastacmd" and [9]. If you want to use custom databases within GenDB, the header lines also have to follow the defline standard.

Installing GenDB

Get the most recent GenDB tarball from sourceforge([10])
Unpack the tarball in a temporary directory
Run the installation script ("sh install_gendb.sh <target directory>")

The installation script will check for the Perl modules, paths to binaries and databases, install the components and setup the system.

Note:

The GOPArc component downloads several databases from external servers, like the COG and KOG databases and the KeGG databases (others than the database mentioned above). You need an internet connection during the installation. The installation script will ask for a proxy server to be used for downloading. If you do not need a proxy leave the field empty. Otherwise enter the complete proxy URL, e.g. `http://proxy.my-institute.org:1234`.
During download the script also tries to fetch the KeGG pathway reference maps. These are about 2.300 single files, which unfortunatly have to be fetched one by one.
Some of the installation scripts produce a lot of output with warnings and error messages. We are aware of this and will fix it in the next release.

Setting up the web server

The apache web server is usually available as a package provided by the operating system vendors; however the way to configure the server differs between operating systems and distributions. We have provided a script called "setup_web_interface" that tries to detect the apache version and is able to create a configuration fragment you can use to setup the web server. The fragment contains the necessary configuration directives, but may need to be adopted to your local installation.

Starting the dispatcher

The "dispatcher" is the part of GenDB 2.2 responsible for managing external tools, running them on the cluster, parsing output and so on. It needs to be started prior to submitting tools within the GenDB 2.2 web interface or by a command line script. Starting a dispatcher is simple:

`cd` into the `/bin` directory of your GenDB installation
execute `./dispatcher -l /tmp/your_dispatcher.log`
for advanced options just execute `./dispatcher` without any arguments

You may start the dispatcher at system boot time, using a so called "init script". Again these scripts differ between operating systems and distributions, so we are not able to provide a ready-made one. Ask your local system administrator for more information about how to write an init script.

The installation is finished now and you can start with adding a new GenDB project and processing the first data. See the core scripts page for more information about this.

@@ Line 1: / Line 1: @@
-__NOTOC__
 = GenDB Installation Instructions =
-Below you can find some basic installation instructions for installing GenDB from a tarball.
+Below you can find some basic installation instructions for installing GenDB from a tarball. There's also a [[GenDBWiki/AdministratorDocumentation/GenDBInstallationFAQ|FAQ page]] describing some of the problems that may occur during installation.
-* 1: Checking system requirements
+Although we are testing our software on a number of systems prior to release, it may contain errors and in the worst case it does not work in the way you expect it.
-  As a Perl application, GenDB 2.2 requires a recent version of Perl and a number of additional (non bioinfomatics related) tools:
+GenDB 2.2 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE!
+== Checking system requirements ==
+As a Perl application, GenDB 2.2 requires a recent version of Perl and a number of additional (non bioinformatics related) tools:
 * Perl (version 5.8 or higher)
 * MySQL (version 4.0 or higher)
@@ Line 17: / Line 19: @@
 * Apache ([http://httpd.apache.org])
 * mod_perl (optional, but "highly recommended" [http://perl.apache.org])
+You also need an administrative account at the MySQL server for creating several databases during the installation.
-  You also need an administrativ account at the MySQL server for creating several databases during the installation.
+== Installing the required Perl modules ==
+GenDB 2.2 relies on a number of freely available Perl modules. These modules are available at the CPAN archive ([http://www.cpan.org]). You need to install these modules prior to installing GenDB 2.2. Depending on the kind of operating system these modules may also be available as packages provided by your operating system vendors; using these packages is recommended.
-* 2: Installing the required Perl modules
+List of required Perl modules (in no particular order):
-  GenDB 2.2 relies on a number of freely available Perl modules. These modules are available at the CPAN archive ([http://www.cpan.org]). Your need to install these modules prior to installing GenDB 2.2.
-  Depending on the kind of operating system these modules may also be available as package provided by your operating system vendors; using these packages is recommended.
-  List of Perl module (in no particular order):
 * Graph
 * GD
@@ Line 45: / Line 43: @@
 * BioPerl (also available at [http://www.bioperl.org])
-  Please keep in mind that these modules may have further requirements and may depend on further modules or libraries. If you are about to install the modules by yourself instead of using ready-made packages, using the so called "CPAN Shell" will help you resolving dependencies and installing modules. See the section about installing perl modules at the CPAN website ([http://www.cpan.org/misc/cpan-faq.html#How_install_Perl_modules]) or the manpage of the CPAN module (if installed). Newer versions of Perl (> 5.8) also provide an improved version of the CPAN Shell, called "CPANPlus".
+Please keep in mind that these modules may have further requirements and may depend on further modules or libraries. If you are going to install the modules by yourself instead of using ready-made packages, using the so called "CPAN Shell" will help you resolving dependencies and installing modules. See the section about [http://www.cpan.org/misc/cpan-faq.html#How_install_Perl_modules installing Perl modules] at the CPAN website or the manpage of the CPAN module (if installed). Newer versions of Perl (> 5.8) also provide an improved version of the CPAN Shell, called "CPANPlus".
-* 3: Installing bioinformatics software
-  GenDB 2.2 does not contain methods for gene calling or predicting gene functions; it uses a number of mostly freely available, well-known tools. Most of these tools are required for running GenDB 2.2.
+== Installing bioinformatics software ==
+GenDB 2.2 does not contain methods for gene calling or predicting gene functions; it uses a number of mostly freely available, well-known tools. Most of these tools are required for running GenDB 2.2.
+* Gene calling:
+** Critica
+** Glimmer 2
+** tRNAScan-SE
+** SearchForRNAs (available at our ftp server)
+** QRNA (optional)
+** rbsfinder (optional)
+* Function prediction:
+** NCBI Blast ([http://www.ncbi.nlm.nih.gov/])
+** HMMER ([http://hmmer.wustl.edu/])
+** InterProScan ([http://www.ebi.ac.uk])
+** EMBOSS ([http://www.emboss.org])
+** SAPS (optional)
+** TMHMM (optional)
+** SignalP (optional)
+At least the mandatory tools need to be setup and configured properly.
-  Gene calling:
+== Getting the required sequence databases ==
+The tools described above operate on various databases containing biological sequences, pattern, HMMs etc. You need at least the following databases for GenDB 2.2:
+* a non redundant nucleotide database (usually called "nt", available by ftp at the NCBI and the EBI)
+* a non redundant protein database ("nr", also available at the NCBI/EBI)
+* a blastable SwissProt database<br>You may either use a standalone database or a subset of the "nr" database provided by the NCBI or the EBI. Setting up this subset is beyond the scope of this documentation.
+* a database containing the protein sequences of all genomes available in the KeGG database<br>You can build this database by downloading the sequences from the KeGG [ftp://ftp.genome.jp/pub/kegg/genomes/sequences ftp server] and concatenating the necessary files.
-* Critica
+You have to ensure that the FASTA header line format follows the defline format required by the NCBI. See the documentation to "formatdb" and "fastacmd" and [ftp://ftp.ncbi.nih.gov/blast/db/README]. If you want to use custom databases within GenDB, the header lines also have to follow the defline standard.
-* Glimmer 2
-* tRNAScan-SE
-* SearchForRNAs (available at our ftp server)
-* QRna (optional)
-* rbsfinder (optional)
-  Function prediction:
+== Installing GenDB ==
+* Get the most recent GenDB tarball from sourceforge([http://sourceforge.net/projects/gendb])
+* Unpack the tarball in a temporary directory
+* Run the installation script ("sh install_gendb.sh <target directory>")
-* NCBI Blast ([http://www.ncbi.nlm.nih.gov/])
+The installation script will check for the Perl modules, paths to binaries and databases, install the components and setup the system.
-* HMMER ([http://hmmer.wustl.edu/])
-* InterProScan ([httt://www.ebi.ac.uk])
-* EMBOSS ([http://www.emboss.org])
-* SAPS (optional)
-* TMHMM (optional)
-* SignalP (optional)
-  At least the mandatory tools need to be setup and configured properly.
+Note:
+* The GOPArc component downloads several databases from external servers, like the COG and KOG databases and the KeGG databases (others than the database mentioned above). You need an internet connection during the installation. The installation script will ask for a proxy server to be used for downloading. If you do not need a proxy leave the field empty. Otherwise enter the complete proxy URL, e.g. `http://proxy.my-institute.org:1234`.
+* During download the script also tries to fetch the KeGG pathway reference maps. These are about 2.300 single files, which unfortunatly have to be fetched one by one.
+* Some of the installation scripts produce a lot of output with warnings and error messages. We are aware of this and will fix it in the next release.
-* 4: Getting the required sequence databases
+== Setting up the web server ==
-  The tools described above operate on various databases containing biological sequences, pattern, HMMs etc. You need at least the following databases for GenDB 2,2:
-* a non redundant nucleotide database (usually called "nt", available by ftp at the NCBI and the EBI)
+The apache web server is usually available as a package provided by the operating system vendors; however the way to configure the server differs between operating systems and distributions. We have provided a script called "setup_web_interface" that tries to detect the apache version and is able to create a configuration fragment you can use to setup the web server. The fragment contains the necessary configuration directives, but may need to be adopted to your local installation.
-* a non redundant protein database ("nr", also available at the NCBI/EBI)
-* a blastable SwissProt database
-     You may either use a standalone database or a subset of the "nr" database provided by the NCBI or the EBI. Setting up this subset is beyond the scope of this documentation.
-* a database containing the protein sequences of all genomes available in the KeGG database
-     You can build this database by downloading the sequences from the KeGG ftp server ([ftp://ftp.genome.jp/pub/kegg/genomes/sequences]) and concatenating the necessary files.
-  You have to ensure that the fasta header line format follows the defline format required by the NCBI. See the documentation to "formatdb" and "fastacmd" and [ftp://ftp.ncbi.nih.gov/blast/db/README]. If you want to use custom databases within GenDB, the header lines also have to follow the defline standard.
+== Starting the dispatcher ==
-* 5: Getting the GenDB 2.2 tarball ([ftp://ftp.cebitec.uni-bielefeld.de/pub/software/gendb])
+The "dispatcher" is the part of GenDB 2.2 responsible for managing external tools, running them on the cluster, parsing output and so on. It needs to be started prior to submitting tools within the GenDB 2.2 web interface or by a command line script. Starting a dispatcher is simple:
-* 6: Unpacking the tarball in a temporary directory
-* 7: Run the installation script ("sh install_gendb.sh <target directory>")
-    The installation script will check for the Perl modules, pathes to binaries and databases, install the components and setup the system.
-    Note:
+* `cd` into the `/bin` directory of your GenDB installation
-* The GOPArc components downloads several databases from external servers, like the COG and KOG databases, KeGG databases (others than the database mentioned above). You need a internet connection during the installation. The installation script will ask for a proxy server to be used for downloading. If you do not need a proxy leave the field empty. Otherwise enter the complete proxy URL, e.g. http://proxy.my-institute.org:1234.
+* execute `./dispatcher -l /tmp/your_dispatcher.log`
-* During the download the script also tries to fetch the KeGG pathway reference maps. These are about 2.300 single files, which unfortunatly have to be fetched one by one.
+* for advanced options just execute `./dispatcher` without any arguments
-* Some of the installation scripts produce a lot of output with warnings and error messages. We are aware of this and will fix it in the next release.
-* 8: Setting up the web server
-    The apache web server is usually available as package provided by the operating system vendors; however the way to configure the server differs between operating systems and distributions. We have provided a script called "setup_web_interface" that tries to detect the apache version and is able to create a configuration fragment you can use to setup the web server. The fragment contains the necessary configuration directives, but may need to be adopted to your local installation.
-* 9: Start a dispatcher
+You may start the dispatcher at system boot time, using a so called "init script". Again these scripts differ between operating systems and distributions, so we are not able to provide a ready-made one. Ask your local system administrator for more information about how to write an init script.
-    The "dispatcher" is the part of GenDB 2.2 responsible for managing external tools, running them on the cluster, parsing output and so on. It need to be started prior to submitting tools within the GenDB 2.2 web interface or by a command line script. You may start the dispatcher at system boot time, using a so called "init script". Again these scripts differ between operating systems and distributions, so we are not able to provide a ready-made one. Ask your local system administrator for more information about how to write an init script.
 The installation is finished now and you can start with adding a new GenDB project and processing the first data. See [[GenDBWiki/CoreDocumentation/CoreScripts|the core scripts page]] for more information about this.

GenDBWiki/AdministratorDocumentation/GenDBInstallation: Difference between revisions