GenDBWiki/AdministratorDocumentation/GenDBInstallationFAQ

From BRF-Software
Jump to navigation Jump to search

Frequently asked questions (FAQ) about installing and setting up GenDB 2.2

Database setup and requirements

  • The installation fails due to insufficient permissions on the database server for "gendb@localhost".
    The installation systems creates several user accounts on the database server. It does not specify the host name to allow the web server and command line script to be used within a network (e.g. a dedicated database server and a dedicated web server). The default setup of MySQL often already contains an account for "%@localhost". This account has precedence over the ones generated by the installation system (see also the MySQL privilege system documentation). You have to remove this account prior to installing GenDB 2.2.

Using GenDB with a different queueing system

GenDB was developed using a SGE (Sun Grid Engine) based compute cluster. The release version of GenDB does not rely on SGE, but the installation system still expects to find a SGE directory structure. To use GenDB with a different queueing system, two steps are necessary:

  • faking the directory structure expected by the installation system
  • providing a DRMAA compatible interface to the existing queueing system

The installation system asks for the directory containing the SGE installation and verifies the entered value by checking the file <dir>/util/arch (a utility of SGE to determine the current architecture). If also asks for the SGE cell to be used, using the name "default" as default. A file <dir>/<cell name>/common/settings.sh has to exist. The values a used to set the environment variable $SGE_ROOT and $SGE_CELL. If you want to override the settings or prevent the variables from being set, you need to modify the file <gendb directory>/share/common/exec/build_environment.fragment.

Getting a DRMAA interface for a different queueing system is a tougher task. Although GenDB restricts its DRMAA usage to mandatory features, some implementations do not even provide these (e.g. the (Open)PBS/Torque implementation). We have written a drop-in replacement for the DRMAA module within GenDB that emulates some of the missing functionality. It is currently in use with a OpenPBS driven cluster and should work fine (thanks to Ben Vanhaeren for his help). The module is available at our ftp server (DRMAA_Torque.pm). You need to backup and replace the file <gendb directory>/share/common/perl/Scheduler/DRMAA.pm. The new module might work or not; we cannot support different cluster systems at the moment, so you are on your own.

Downloads during the installation

  • During the installation a number of files are downloaded. What are these files ?

The GOPArc component stores a number of classification schemes and information about metabolic pathways from the KeGG database. The license for these information does not allow us to bundle the files with GenDB 2.2. The installation system has to download the to build the GOPArc database. The files are:

  • COG and KOG from NCBI ([1])
  • the compound, enzyme and reaction files from KeGG ligand database ([2])
  • the reference maps from the KeGG pathway database ([3])
  • I already have these files. How can i skip the download ?

The current release does not support skipping the download if the files are already available. Some of the files have to be postprocessed during the installation (e.g. correct headers for the COG blastable database). The GOPArc component also expects the files to be available in certain locations within the installation directory. But we may change the download and conversion scripts in a later release to support locally available files.

Changing configuration after installation

  • I need to change some settings after the installation.

The installation system provides a script called "reconfigure_module". It enables administrators to change the configuration settings done at installation time. See also the next two FAQs.

  • I want to move the databases to another server.

Moving the database to another server requires several configuration changes:

  • changing the database server name for the project management component by the "reconfigure_module" script.
  • the name is also stored in the project management database itself and has to be changed manually

The project management allows the various project database to be stored on several database servers. Each project database is associated with a database host in the project management database. If the databases are moved to another server, this information has to be updated, too. We recommend to use DNS aliases ("CNAME") for the database server(s). Moving the database to another machine only requires changing the alias in the DNS configuration.

  • I have moved the sequences databases ("nr", "nt", etc.) to another directory. How can I change the pathes withi GenDB 2.2 ?

Unfortunatly changing the pathes at one single location is not possible. For every blast tool that is used in a GenDB project the location of the blastable database file is stored in the project database itself. If you have to move the database files to another location, you have to change the pathes in all GenDB project databases. We may provide a script for this later. You also have to change the configuration of the GenDB component with the "reconfigure_module" script, since it is used for setting up new project and the default tools, including the pathes to the blastable databases.

Modifying the MySQL setup for GenDB

If you see an error like DBD::mysql::db do failed: Got a packet bigger than 'max_allowed_packet' ... you may need to modify your MySQL server settings. If you use a preconfigured MySQL installation, you will probably run into a problem. Data transfer between GenDB and MySQL is done using packets of data. The default size of this packets is limited to 1 MB for small MySQL setups. Hence trying to import contigs with more than 1 MB will fail. Your adminstrator has to change the configuration file of MySQL (usually /etc/my.cnf):

  • Locate the line
 set-variable = max_allowed_packet=1M
  • and change it, e.g. to
 set-variable = max_allowed_packet=16M

(If you are planning to use genomic sequences with more than 16 MB, your should use a higher setting.)