GenDBWiki/CoreDocumentation/CoreScripts: Difference between revisions

From BRF-Software
Jump to navigation Jump to search
imported>BurkhardLinke
No edit summary
No edit summary
 
(12 intermediate revisions by 5 users not shown)
Line 4: Line 4:
After installing and setting up the GenDB genome annotation system, the GenDB core scripts listed below can be used to annotate a genome without setting up a graphical user interface (neither the web frontend nor the Gtk GUI). After importing an EMBL or FASTA file you can create a set of tools for running the region and function prediction. After computing all bioinformatics tools, all genes can be annotated automatically and the results can be exported into different output format files. Furthermore, some scripts are provided for creating different plots and graphics. Last but not least your imported contigs can be deleted in order to cleanup the GenDB project database.  
After installing and setting up the GenDB genome annotation system, the GenDB core scripts listed below can be used to annotate a genome without setting up a graphical user interface (neither the web frontend nor the Gtk GUI). After importing an EMBL or FASTA file you can create a set of tools for running the region and function prediction. After computing all bioinformatics tools, all genes can be annotated automatically and the results can be exported into different output format files. Furthermore, some scripts are provided for creating different plots and graphics. Last but not least your imported contigs can be deleted in order to cleanup the GenDB project database.  


/!\ All scripts are used on your own risk, so have your backup prepared for the worst case! /!\
{{Warning}} All scripts are used on your own risk, so have your backup prepared for the worst case!


The scripts are part of the GenDB distribution and thus located in the `'gendb/share/exec/'` directory of your current GenDB environment. Execute all scripts from the `'gendb/bin/'` directory using the `'gendb_start'` wrapper script. Executing a script without any arguments gives a detailed usage message where all available options are explained. Please take a look at the documentation of each script for alternative options or advanced features.
The scripts are part of the GenDB distribution and thus located in the `'gendb/share/exec/'` directory of your current GenDB environment. Execute all scripts from the `'gendb/bin/'` directory using the `'gendb_start'` wrapper script. Executing a script without any arguments gives a detailed usage message where all available options are explained. Please take a look at the documentation of each script for alternative options or advanced features.


{| border="1" cellpadding="2" cellspacing="0"
{| border="1" cellpadding="2" cellspacing="0"
|colspan="2"  | First of all, we need to create a "project" for storing the data, restricting access to certain users etc.  
|colspan="2"  | 1) First of all, we need to create a "project" for storing the data, restricting access to certain users etc.  
|-
|-
'add_gendb_project'
<tt>add_gendb_project.pl</tt>
creates a new GenDB project by setting up a new database and registering the project to the project management  
Before we can import any data we need to create a new GenDB project by setting up a new database and registering the project to the project management system ([[GPMSWiki|GPMS]]).
|-
|-
|colspan="2"  | Afterwards, we can to import a contig into GenDB either in FASTA or EMBL/GenBank format.  
|colspan="2"  | 2) Afterwards, we can import a contig into GenDB either in FASTA or EMBL/GenBank format.  
|-
|-
`import_FASTA.pl`
<tt>import_FASTA.pl</tt>
|  Import a FASTA file containing one or more contig sequences.  
|  Import a FASTA file containing one or more contig sequences.  
|-
|-
`import_EMBL_GBK.pl`
<tt>import_EMBL_GBK.pl</tt>
|  Import an EMBL or GenBank file.  
|  Import an EMBL or GenBank file. If you have a graphical user interface running, you should see the imported data if you open the project.  
|-
|-
|colspan="2"  | In the next steps we can compute the automatic annotation.  
|colspan="2"  | 3) In the next steps we can compute the automatic annotation including the gene prediction and function assignment.  
|-
|-
`run_gendb_pipeline.pl`
<tt>run_gendb_pipeline.pl</tt>
|  A script that creates a set of default tools for the co-ordinated prediction of regions and functions within a given project. The user therefore either can specify a single contig of a project or import a new contig sequence into the project. In both cases a set of default tools (e.g. Critica, Glimmer, various BLAST tools,Reganor) will automatically be created, will be set up for this contig and finally will be run. If for a specific contig results from a former run of this script already exist, the user can have them updated. This script uses the `[[ToolCreator]].pm` wizard module.  
|  A script that creates a set of default tools for the co-ordinated prediction of regions and functions within a given project. The user therefore either can specify a single contig of a project or import a new contig sequence into the project. In both cases a set of default tools (e.g. Critica, Glimmer, various BLAST tools,Reganor) will automatically be created, will be set up for this contig and finally will be run. If for a specific contig results from a former run of this script already exist, the user can have them updated. This script uses the `[[ToolCreator]].pm` wizard module. Use this script for running a standard genome annotation by just executing a single command line.
|-
|-
`tool_creator.pl`
<tt>run_comparative_pipeline.pl</tt>
This script creates a default set of tools. You can either restrict the set of tools to region tools and function tools respectively or you can create both types. Additionally you can create a Metanor tool. This script also uses the `[[ToolCreator]].pm` wizard module.  
A script that creates a set of default tools for the co-ordinated prediction of regions and functions within a given project. The user therefore has to specify a single contig that is used as a reference annotation. Gismo gene calling is performed on all other contigs. The predicted CDS are  blasted against the CDS od the reference contig. The necessary tools will automatically be created. This script uses the `[[ToolCreator]].pm` and '[[JobSubmitter]].pm' wizard modules. Use this script for running a comparative genome annotation by just executing a single command line.
|-
|-
`submit_job.pl`
<tt>tool_creator.pl</tt>
The computation of some bioinformatics tools can be started by using the GenDB job submitter.  
This script just creates a default set of tools for a GenDB project without starting any computations. You can either restrict the set of tools to region tools (for gene prediction) and function tools (for automatic function assignment) respectively or you can create both types. Additionally you can create a Metanor tool. This script also uses the `[[ToolCreator]].pm` wizard module. Use this script if you don't want to run the complete pipeline of all region and function tools at once.  
|-
|-
|colspan="2" | After finishing our genome annotation we can export the results and create some nice plots.  
<tt>submit_job.pl</tt>
| After creating some tools for a GenDB project you can start the computation of some bioinformatics tools by using the GenDB job submitter. For the computation you will need a running dispatcher and all tools installed locally.
|-
|-
`export_EMBL_GBK.pl`
|colspan="2" | 4) After finishing our genome annotation we can export the results and create some nice plots.
|-
|  <tt>export_EMBL_GBK.pl</tt>
|  Export an EMBL or GenBank file for a single contig stored within a GenDB project.  
|  Export an EMBL or GenBank file for a single contig stored within a GenDB project.  
|-
|-
`wholegenome_plot.pl`
<tt>wholegenome_plot.pl</tt>
|  Visualizes an overview of a contig in a chessboard style. User defined color schemes may be used to highlight groups of genes.  
|  Visualizes an overview of a contig in a chessboard style. User defined color schemes may be used to highlight groups of genes.  
|-
|-
`genome_plot.pl`
<tt>genome_plot.pl</tt>
|  Visualizes an overview of a contig as a circular genome map.  
|  Visualizes an overview of a contig as a circular genome map.  
|-
|-
|colspan="2"  | Finally, we can clean up our project by deleting a contig.  
|colspan="2"  | 5) Finally, we can clean up our project by deleting a contig.  
|-
|-
`delete_contig.pl`
<tt>delete_contig.pl</tt>
|  Delete a contig from a GenDB project. <br> /!\ Running this script might take some time for large contigs with many subregions and computed observations.  
|  Delete a contig from a GenDB project. <br> {{Warning}} Running this script might take some time for large contigs with many subregions and computed observations.  
|}
|}
All functions listed above are also available via the GenDB web interface so that you don't need to run GenDB only via this script interface. Nevertheless, especially the scripts for computing tools and submitting jobs offer many options and the most flexible way for running a genome annotation.
----
----



Latest revision as of 20:18, 29 October 2011

GenDB Core Scripts

After installing and setting up the GenDB genome annotation system, the GenDB core scripts listed below can be used to annotate a genome without setting up a graphical user interface (neither the web frontend nor the Gtk GUI). After importing an EMBL or FASTA file you can create a set of tools for running the region and function prediction. After computing all bioinformatics tools, all genes can be annotated automatically and the results can be exported into different output format files. Furthermore, some scripts are provided for creating different plots and graphics. Last but not least your imported contigs can be deleted in order to cleanup the GenDB project database.

Warning.png All scripts are used on your own risk, so have your backup prepared for the worst case!

The scripts are part of the GenDB distribution and thus located in the `'gendb/share/exec/'` directory of your current GenDB environment. Execute all scripts from the `'gendb/bin/'` directory using the `'gendb_start'` wrapper script. Executing a script without any arguments gives a detailed usage message where all available options are explained. Please take a look at the documentation of each script for alternative options or advanced features.

1) First of all, we need to create a "project" for storing the data, restricting access to certain users etc.
add_gendb_project.pl Before we can import any data we need to create a new GenDB project by setting up a new database and registering the project to the project management system (GPMS).
2) Afterwards, we can import a contig into GenDB either in FASTA or EMBL/GenBank format.
import_FASTA.pl Import a FASTA file containing one or more contig sequences.
import_EMBL_GBK.pl Import an EMBL or GenBank file. If you have a graphical user interface running, you should see the imported data if you open the project.
3) In the next steps we can compute the automatic annotation including the gene prediction and function assignment.
run_gendb_pipeline.pl A script that creates a set of default tools for the co-ordinated prediction of regions and functions within a given project. The user therefore either can specify a single contig of a project or import a new contig sequence into the project. In both cases a set of default tools (e.g. Critica, Glimmer, various BLAST tools,Reganor) will automatically be created, will be set up for this contig and finally will be run. If for a specific contig results from a former run of this script already exist, the user can have them updated. This script uses the `ToolCreator.pm` wizard module. Use this script for running a standard genome annotation by just executing a single command line.
run_comparative_pipeline.pl A script that creates a set of default tools for the co-ordinated prediction of regions and functions within a given project. The user therefore has to specify a single contig that is used as a reference annotation. Gismo gene calling is performed on all other contigs. The predicted CDS are blasted against the CDS od the reference contig. The necessary tools will automatically be created. This script uses the `ToolCreator.pm` and 'JobSubmitter.pm' wizard modules. Use this script for running a comparative genome annotation by just executing a single command line.
tool_creator.pl This script just creates a default set of tools for a GenDB project without starting any computations. You can either restrict the set of tools to region tools (for gene prediction) and function tools (for automatic function assignment) respectively or you can create both types. Additionally you can create a Metanor tool. This script also uses the `ToolCreator.pm` wizard module. Use this script if you don't want to run the complete pipeline of all region and function tools at once.
submit_job.pl After creating some tools for a GenDB project you can start the computation of some bioinformatics tools by using the GenDB job submitter. For the computation you will need a running dispatcher and all tools installed locally.
4) After finishing our genome annotation we can export the results and create some nice plots.
export_EMBL_GBK.pl Export an EMBL or GenBank file for a single contig stored within a GenDB project.
wholegenome_plot.pl Visualizes an overview of a contig in a chessboard style. User defined color schemes may be used to highlight groups of genes.
genome_plot.pl Visualizes an overview of a contig as a circular genome map.
5) Finally, we can clean up our project by deleting a contig.
delete_contig.pl Delete a contig from a GenDB project.
Warning.png Running this script might take some time for large contigs with many subregions and computed observations.

All functions listed above are also available via the GenDB web interface so that you don't need to run GenDB only via this script interface. Nevertheless, especially the scripts for computing tools and submitting jobs offer many options and the most flexible way for running a genome annotation.


TODO:

  • add more essential scripts