GenDBWiki/CoreDocumentation/RegionPrediction: Difference between revisions
| imported>AliceMcHardy No edit summary | imported>AliceMcHardy  No edit summary | ||
| Line 53: | Line 53: | ||
| For the prediction of tRNA genes, tRNAscan-SE is run. For the prediction of RNA genes, SearchForRNAs is run, which | For the prediction of tRNA genes, tRNAscan-SE is run. For the prediction of RNA genes, SearchForRNAs is run, which | ||
| uses tRNAscan-SE. Based on the observations from the different CDS and RNA genes are automatically annotated.   | uses tRNAscan-SE. Based on the observations from the different CDS and RNA genes are automatically annotated.   | ||
| Besides the Critica predicted CDSs, additional Glimmer(ct) predicted CDSs which do not overlap more than 50bps with these   | Besides the Critica predicted CDSs, additional Glimmer(ct) predicted CDSs which do not overlap more than 50bps with these | ||
| are annotated. This way, sensitivity compared to Critica is increased without significantly losing in specificity.   | or RNA genes are annotated. This way, sensitivity compared to Critica is increased without significantly losing in specificity.   | ||
|   The reliability of the different predictions is reflected by the 'status region' of these regions. CDSs   |   The reliability of the different predictions is reflected by the 'status region' of these regions. CDSs   | ||
| predicted by Critica are assigned 'status 2', additional annotated Glimmer(ct) predictions with high 'Vote | predicted by Critica are assigned 'status 2', additional annotated Glimmer(ct) predictions with high 'Vote | ||
| Line 63: | Line 63: | ||
| Usage: reganor.pl -p <project name> (-r <region> | -a) [-t -f -n] <br> | Usage: reganor.pl -p <project name> (-r <region> | -a) [-t -f -n] <br> | ||
| where   <br> | where   <br> | ||
|          -p project name - the GenDB project to be run on   |          -p project name - the GenDB project to be run on<br>  | ||
|          -r region       - Contig to be analyzed   - or - <br> |          -r region       - Contig to be analyzed   - or - <br> | ||
|          -a                run on all contigs <br> |          -a                run on all contigs <br> | ||
| Line 73: | Line 73: | ||
| The complete Glimmer package is integrated into GenDB in a comfortable manner. A Glimmer tool optimally | The complete Glimmer package is integrated into GenDB in a comfortable manner. A Glimmer tool optimally | ||
| configured according to the evaluation given in  | configured according to the evaluation given in McHardy et al. (Bioinformatics, 2004) can be   | ||
| created with the default_tool_creation script or simply by using reganor.pl, where this tool | created with the default_tool_creation script or simply by using reganor.pl, where this tool | ||
| will be created and run along with the other gene finding programs of the default GenDB gene finding | will be created and run along with the other gene finding programs of the default GenDB gene finding | ||
| pipeline. Configuration of individually designed Glimmer tools can be done via the  | pipeline. Configuration of individually designed Glimmer tools can be done via the ToolConfigurationWizard   | ||
| in the graphical user interface. All options currently configurable are explicitly specified in this | in the graphical user interface. All options currently configurable are explicitly specified in this | ||
| interface. Note that there is an additional wrapper around the Glimmer functionality, which does not | interface. Note that there is an additional wrapper around the Glimmer functionality, which does not | ||
| Line 125: | Line 125: | ||
| Configurable options: | Configurable options: | ||
| Training tool | {| border="1" cellpadding="2" cellspacing="0" | ||
| |  Training tool   | |||
| Training sequence | |  Any CDS prediction method for which observations on the specified training sequence exist. | ||
| |- | |||
| |  Training sequence   | |||
| |  GenDB regions or an external sequence file in FASTA format.vcFor external sequences long orfs will be used as training tool. | |||
| Look for RBS in start prediction Yes/ No along with pattern to search for. | |- | ||
| |  Look for RBS in start prediction   | |||
| Minimum gene length | |  Yes/ No along with pattern to search for.   | ||
| |- | |||
| Linear region | |  Minimum gene length   | ||
| |  Minimum length of genes to be predicted. | |||
| |- | |||
| Threshold | |  Linear region   | ||
| |  vs. circular, with genes being predicted over the end of the sequence. | |||
| |- | |||
| |  Threshold   | |||
| |  Probability score cut-off for prediction. Default is 90 (out of 99).   | |||
| |} | |||
| <span id="critica"></span> | <span id="critica"></span> | ||
| Line 145: | Line 149: | ||
| Critica is part of the GenDB default gene finding pipeline usable with the Reganor Wizard.   | Critica is part of the GenDB default gene finding pipeline usable with the Reganor Wizard.   | ||
| Different options from the default are configurable versus the  | Different options from the default are configurable versus the ToolCreationWizard.   | ||
| Critica as integrated into GenDB does not allow separation of the training and prediction step. If | Critica as integrated into GenDB does not allow separation of the training and prediction step. If | ||
| you want to run it for a set of smaller contigs from one genome, the best way to do this is by 'chaining'   | you want to run it for a set of smaller contigs from one genome, the best way to do this is by 'chaining'   | ||
| Line 172: | Line 176: | ||
| The program "rbs_finder.pl" implements an algorithm to find | The program "rbs_finder.pl" implements an algorithm to find | ||
| ribosome  binding  sites (RBS) in the upstream regions of the | ribosome  binding  sites (RBS) in the upstream regions of the | ||
| genes annotated by Glimmer2,  | genes annotated by Glimmer2, GeneMark, or other  prokaryotic | ||
| gene  finders.   If  there  is  no RBS-like patterns in this | gene  finders.   If  there  is  no RBS-like patterns in this | ||
| region, program searches for a start codon having a RBS-like | region, program searches for a start codon having a RBS-like | ||
| Line 178: | Line 182: | ||
| and relocates start codon accordingly. | and relocates start codon accordingly. | ||
| Explanations for some directly configurable options in the  | Explanations for some directly configurable options in the ToolConfigurationWizard: | ||
| Window Size  This parameter  determines  how | {| border="1" cellpadding="2" cellspacing="0" | ||
| |  Window Size   | |||
| |  This parameter  determines  how far  the  program  should look for RBS-like pattern  in the upstream region of each  of  the  genes.  The  best results  obtained  using a window size of 50bps.   | |||
| |- | |||
| |  Iterations   | |||
| Iterations  | |  BSsfinder achieves better results if run iteratively.   | ||
| |} | |||
| RBSFinder options configurable as 'other command-line options': | RBSFinder options configurable as 'other command-line options': | ||
|  The detault sequence  is ("aggag"). However, a computed sequence can be used to get better results. The  method  to  compute the  consensus sequence is as follows:             | |||
|  -Take the complement of last 30bps of  16S  rRNA  <br>          | |||
|  - Find the most abundantly found 5bps subsequence of this complement in the 30bps upstream regions of  the  start codons  annotated  by  Glimmer2.             <br> | |||
|  -Use this sequence as consensus sequence. <br> | |||
| {| border="1" cellpadding="2" cellspacing="0" | |||
| |<Partial_Coord_File>|  | |||
| |  The   coordinates that  user  wants  to  relocate  or check for RBS site,which can be  a  subset  of  coordinates  annotated  by Glimmer2.This file should be in following format:   | |||
| |} | |||
|              || <Gene id> || <Start Codon Coord>  || <Stop Codon Coord> || | |||
|              || 1         || 1030    ||       1140 || | |||
|              || 2         || 1214    ||       3010 || | |||
| <span id="sfr"></span> | <span id="sfr"></span> | ||
| Line 218: | Line 218: | ||
| Configurable options: | Configurable options: | ||
| Type of RNA  | {| border="1" cellpadding="2" cellspacing="0" | ||
| Organism ID     D = none; Give your organism an ID like "EC", "BS" ..   | |  Type of RNA   | ||
| Domain  | |   D = all; a string like "tRNA, 16S, 23S, 5S"   | ||
| Organism Genus  | |- | ||
| Organism Species D = all; species name of organism | |  Organism ID   | ||
| |     D = none; Give your organism an ID like "EC", "BS" .. | |||
| |- | |||
| |  Domain   | |||
| |         D = all; phylogenetic domain of organism, A,B,E   | |||
| |- | |||
| |  Organism Genus   | |||
| |    D = all; genus name of organism   | |||
| |- | |||
| |  Organism Species   | |||
| |  D = all; species name of organism   | |||
| |} | |||
| Other command line options: | Other command line options: | ||
|    --probes  | {| border="1" cellpadding="2" cellspacing="0" | ||
|    --complete  | |   --probes   | ||
| |    D = none; an organism name, or part of   | |||
| |- | |||
| |   --complete   | |||
| |  D = 95; min. pct completeness of probe sequence(s)   | |||
| |} | |||
| === tRNAscan-SE === | === tRNAscan-SE === | ||
| Line 254: | Line 270: | ||
| Configurable as additional options: | Configurable as additional options: | ||
|    -B or -P    | {| border="1" cellpadding="2" cellspacing="0" | ||
|    -A          | |   -B or -P     | ||
|    -O          | |  search for bacterial tRNAs (use bacterial tRNA model)   | ||
|    -G          | |- | ||
| |   -A           | |||
|    -C          | |  search for archaeal tRNAs    (use archaeal tRNA model)   | ||
| |- | |||
| |   -O           | |||
| |  search for organellar (mitochondrial/chloroplast) tRNAs   | |||
|    -D          | |- | ||
| |   -G           | |||
| |  use general tRNA model (cytoplasmic tRNAs from all 3 domains included)   | |||
| |- | |||
| |   -C           | |||
| |  search using covariance model analysis only (max sensitivity, slow)   | |||
| |- | |||
| |  -H           | |||
| |  show both primary and secondary structure components to covariance model bit scores   | |||
| |- | |||
| |   -D           | |||
| |  disable pseudogene checking   | |||
| |} | |||
| Specify Alternate Cutoffs / Data Files: | Specify Alternate Cutoffs / Data Files: | ||
|    -X <score>  | {| border="1" cellpadding="2" cellspacing="0" | ||
|    -L <length> | |   -X <score>   | ||
| |  set cutoff score (in bits) for reporting tRNAs (default=20)   | |||
| |- | |||
| |   -L <length>   | |||
| |  set max length of tRNA intron+variable region (default=116bp)   | |||
| |} | |||
|    -I <score>   | {| border="1" cellpadding="2" cellspacing="0" | ||
|    -z <number>  | |   -I <score>    | ||
| |  manually set "intermediate" cutoff score for EufindtRNA   | |||
| |- | |||
| |   -z <number>   | |||
| |  use <number> nucleotides padding when passing first-pass tRNA bounds predictions to CM analysis (default=7)   | |||
| |} | |||
|    -g <file>    | {| border="1" cellpadding="2" cellspacing="0" | ||
| |   -g <file>     | |||
|    -c <file>    | |  use alternate genetic codes specified in <file> for determining tRNA type   | ||
| |- | |||
| |   -c <file>     | |||
| |  use an alternate covariance model in <file>   | |||
| |} | |||
| Misc Options: | Misc Options: | ||
|    -h          | {| border="1" cellpadding="2" cellspacing="0" | ||
| |   -h           | |||
| |  print this help message   | |||
| |- | |||
| |  -Q           | |||
| |  do not prompt user before overwriting pre-existing result files  (for batch processing)   | |||
| |- | |||
| |  -n <EXPR>   | |||
| |  search only sequences with names matching <EXPR> string  (<EXPR> may contain * or ? wildcard chars)  | |||
| |- | |||
| |  -s <EXPR>    | |||
| |  start search at sequence with name matching <EXPR> string and continue to end of input sequence file(s)  | |||
| |} | |||
| Special Options (for testing & special purposes) | Special Options (for testing & special purposes) | ||
| {| border="1" cellpadding="2" cellspacing="0" | |||
| |  -T            | |||
| |  search using tRNAscan only (defaults to strict params)   | |||
| |- | |||
| |  -t <mode>     | |||
| |  explicitly set tRNAscan params, where <mode>=R or S  (R=relaxed, S=strict tRNAscan v1.3 params)   | |||
| |- | |||
| |  -E            | |||
| |  search using Eukaryotic tRNA finder (EufindtRNA) only (defaults to Normal seach parameters when run alone,  or to Relaxed search params when run with Cove)   | |||
| |- | |||
| |  -e <mode>     | |||
| |  explicitly set EufindtRNA params, where <mode>=R, N, or S(relaxed, normal, or strict)   | |||
| |} | |||
| {| border="1" cellpadding="2" cellspacing="0" | |||
| |  -r <file>     | |||
| |  save first-pass scan results from EufindtRNA and/or  tRNAscan in <file> in tabular results format   | |||
| |- | |||
|    -F <file>    | |    -u <file>     | ||
| |  search with Cove only those sequences & regions delimited in <file> (tabular results file format)   | |||
| |- | |||
| |   -F <file>     | |||
| |  save first-pass candidate tRNAs in <file> that were then found to be false positives by Cove analysis   | |||
|    -V <vers>    | |- | ||
| |   -M <file>     | |||
| |  save all seqs that do NOT have at least one tRNA prediction in them (aka "missed" seqs)   | |||
| |- | |||
| |  -v <file>     | |||
| |  save verbose tRNAscan 1.3 output to <file>   | |||
| |- | |||
| |   -V <vers>     | |||
| |  run an alternate version of tRNAscan where <vers> = 1.3, 1.39, 1.4 (default), or 2.0   | |||
| |- | |||
| |  -K            | |||
| |  Keep redundant tRNAscan 1.3 hits (don't filter out multiple predictions per tRNA identification)   | |||
| |} | |||
| <span id="getorf"></span> | <span id="getorf"></span> | ||
| Line 319: | Line 379: | ||
| Besides the options which can be chosen directly, additional options for this program which   | Besides the options which can be chosen directly, additional options for this program which   | ||
| can be specified as 'additional command line options' during tool configuration | can be specified as 'additional command line options' during tool configuration | ||
| with the  | with the ToolConfigurationWizard: | ||
| {| border="1" cellpadding="2" cellspacing="0" | |||
| | -maxsize      | |||
| |       integer    | |||
| |  Maximum nucleotide size of ORF to report  | |||
| |- | |||
| |  -find         | |||
| |       menu      | |||
| |   This is a small menu of possible output options. The first four options are to select either the protein translation or the original nucleic acid sequence of the open reading frame. There are two possible definitions of an open reading frame: it can either be a region that is free of STOP codons or a region that begins with a START codon and ends with a STOP codon. The last three options are probably only of interest to people who wish to investigate the statistical properties of the regions around potential START or STOP codons. The last option assumes that ORF lengths are calculated between two STOP codons.  | |||
| |- | |||
| |  -[no]methionine    | |||
| |  boolean   | |||
| |    START codons at the beginning of protein products will usually code for Methionine,despite what the codon will code for when it is internal to a protein. This qualifier sets all such START codons to code for Methionine by default.   | |||
| |- | |||
| |  -[no]reverse    | |||
| |      boolean    | |||
| |  Set this to be false if you do not wish to find ORFs in the reverse complement of the sequence.   | |||
| |- | |||
| |   -flanking      | |||
| |       integer    | |||
| |  If you have chosen one of the options of the type of sequence to find that gives the flanking sequence around a STOP or START codon, this allows you to set the number of nucleotides either side of that codon to output. If the region of flanking nucleotides crosses the start or end of the sequence, no output is given for this codon. | |||
Revision as of 12:32, 23 November 2004
Documentation for the GenDB Gene Finding Components
Currently integrated programs
| Program | Task | Reference | 
| REGANOR | Prediction of CDSs | (http://bioinformatics.oupjournals.org/cgi/reprint/20/10/1622) | 
| Glimmer-2.1 | Prediction of CDSs | (http://nar.oupjournals.org/cgi/content/full/27/23/4636) | 
| Critica | Prediction of CDSs, RBSs, Frameshifts | (http://mbe.oupjournals.org/cgi/reprint/16/4/512) | 
| RBSFinder | Relocation of CDS starts, Prediction of RBSs | (http://bioinformatics.oupjournals.org/cgi/reprint/17/12/1123) | 
| SearchforRNAs | Prediction of t- and rRNAs | (Niels Larsen, University Aarhus, Denmark) | 
| tRNAscan-SE | Prediction of tRNAs | (http://nar.oupjournals.org/cgi/content/full/25/5/955) | 
| Getorf | Lists all ORFs | (Contained in EMBOSS package, http://www.hgmp.mrc.ac.uk/Software/EMBOSS/) | 
Documentation for individual programs
In the following sections you can find detailed descriptions for all programs that can be used to predict regions within GenDB.
REGANOR
Reganor is a pipeline which automates the complete gene finding procedure for a sequence within GenDB. For CDS prediction, a combined strategy based on the gene finders Glimmer and Critica applied. Glimmer is run using the Critica predictions on the sequence as a training set of CDSs. This in an extensive evaluation on 113 microbial genome sequences was shown to have a significantly improved overall performance compared to the Glimmer standard application, especially for GC rich genomes. For the prediction of tRNA genes, tRNAscan-SE is run. For the prediction of RNA genes, SearchForRNAs is run, which uses tRNAscan-SE. Based on the observations from the different CDS and RNA genes are automatically annotated. Besides the Critica predicted CDSs, additional Glimmer(ct) predicted CDSs which do not overlap more than 50bps with these or RNA genes are annotated. This way, sensitivity compared to Critica is increased without significantly losing in specificity.
The reliability of the different predictions is reflected by the 'status region' of these regions. CDSs
predicted by Critica are assigned 'status 2', additional annotated Glimmer(ct) predictions with high 'Vote Scores' are assigned 'status 1', low scoring additional Glimmer(ct)-based annotations are assigned 'attention needed', as besides some true positive predictions the latter is likely to contain also some false positive predictions.
Usage: reganor.pl -p <project name> (-r <region> | -a) [-t -f -n] 
where   
-p project name - the GenDB project to be run on
-r region - Contig to be analyzed - or -
-a run on all contigs
-n Do not run autoannotation
-f restart failed, submitted or finished jobs
GLIMMER-2.1
The complete Glimmer package is integrated into GenDB in a comfortable manner. A Glimmer tool optimally configured according to the evaluation given in McHardy et al. (Bioinformatics, 2004) can be created with the default_tool_creation script or simply by using reganor.pl, where this tool will be created and run along with the other gene finding programs of the default GenDB gene finding pipeline. Configuration of individually designed Glimmer tools can be done via the ToolConfigurationWizard in the graphical user interface. All options currently configurable are explicitly specified in this interface. Note that there is an additional wrapper around the Glimmer functionality, which does not recognize the options which can be given directly to the glimmer2 program, so do not specify such options as 'other command line options'. What follows are extracts from the Glimmer documentation and comments on how the programs are integrated within the GenDB system:
Glimmer 1.0 had 4 read me files, and Glimmer 2.0 maintains that structure. The four main programs are:
- long-orfs
- extract
- build-icm
- glimmer2
1. Program long-orfs takes a sequence file (in FASTA format) and outputs a list of all long "potential genes" in it that do not overlap by too much. By "potential gene" I mean the portion of an orf from the first start codon to the stop codon at the end.
2. Program extract takes a FASTA format sequence file and a file with a list of start/stop positions in that file (e.g., as produced by the long-orfs program) and extracts and outputs the specified sequences.
3. Program build-icm.c creates and outputs an interpolated Markov model (IMM) as described in the paper
A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved Microbial Gene Identification with Glimmer. Nucleic Acids Research, 1999, in press.
4. Program glimmer takes two inputs: a sequence file (in FASTA format) and a collection of Markov models for genes as produced by the program build-icm . It outputs a list of all open reading frames (orfs) together with scores for each as a gene.
Comment: Programs 1-3 are used to create a model of CDS properties based on putative CDSs derived from an input set of sequences. In GenDB, these are called the 'training sequences' and can be specified during configuration (e.g. if multiple contigs belonging to one genome). By 'training tool', the method of how the putative CDSs are extracted from these sequences is defined. If not choosing a 'training tool', the Glimmer default 'long-orfs' is applied. The REGANOR wizard uses the gene finder Critica as the default 'training tool'. During the Glimmer tool run, glimmer2 is then used with the ICM-model created in the training phase on the sequence specified to be analyzed.
Output: The extended output resulting from a Glimmer run contains for every prediction three kind of scores: a probability, a 'Vote Score' (stored as the 'result') and a 'Raw Score' (stored in GenDB along with other comments from the Glimmer output as 'comment').
Configurable options:
| Training tool | Any CDS prediction method for which observations on the specified training sequence exist. | 
| Training sequence | GenDB regions or an external sequence file in FASTA format.vcFor external sequences long orfs will be used as training tool. | 
| Look for RBS in start prediction | Yes/ No along with pattern to search for. | 
| Minimum gene length | Minimum length of genes to be predicted. | 
| Linear region | vs. circular, with genes being predicted over the end of the sequence. | 
| Threshold | Probability score cut-off for prediction. Default is 90 (out of 99). | 
CRITICA
Critica is part of the GenDB default gene finding pipeline usable with the Reganor Wizard. Different options from the default are configurable versus the ToolCreationWizard. Critica as integrated into GenDB does not allow separation of the training and prediction step. If you want to run it for a set of smaller contigs from one genome, the best way to do this is by 'chaining' the contigs together with linkers (containing stops in every frame) during the import into the GenDB system.
Output: Observations resulting from a Critica run includes predicted CDSs, predicted ribosome binding sites and predictions of possible frameshifts. The value stored under 'results' for the CDS predictions in GenDB is the P-value given in the Critica output: The P-value is the amount of statistical support for the coding region. Like BLAST, low scores are better.
Configurable options:
Look for RBS in start detection Use ribosome binding pattern to choose start
Note: If Critica does not detect any CDSs, it dies with an error message, although this is not really an error -- the result is that the corresponding GenDB job is assigned the state 'failed. This is most likely to happen if you are trying to analyze very short sequences <= 10000bps with Critica, which is not really what the program is intended for.
RBSFinder
From the documentation: The program "rbs_finder.pl" implements an algorithm to find ribosome binding sites (RBS) in the upstream regions of the genes annotated by Glimmer2, GeneMark, or other prokaryotic gene finders. If there is no RBS-like patterns in this region, program searches for a start codon having a RBS-like pattern,in the same reading frame upstream or downstream and relocates start codon accordingly.
Explanations for some directly configurable options in the ToolConfigurationWizard:
| Window Size | This parameter determines how far the program should look for RBS-like pattern in the upstream region of each of the genes. The best results obtained using a window size of 50bps. | 
| Iterations | BSsfinder achieves better results if run iteratively. | 
RBSFinder options configurable as 'other command-line options':
The detault sequence  is ("aggag"). However, a computed sequence can be used to get better results. The  method  to  compute the  consensus sequence is as follows:            
-Take the complement of last 30bps of  16S  rRNA  
         
- Find the most abundantly found 5bps subsequence of this complement in the 30bps upstream regions of  the  start codons  annotated  by  Glimmer2.             
-Use this sequence as consensus sequence. 
| The coordinates that user wants to relocate or check for RBS site,which can be a subset of coordinates annotated by Glimmer2.This file should be in following format: | 
            || <Gene id> || <Start Codon Coord>  || <Stop Codon Coord> ||
            || 1         || 1030    ||       1140 ||
            || 2         || 1214    ||       3010 ||
SearchforRNAs
From the documentation: Searches a fasta formatted contig file for RNA's and prints tbl style output. It works by using the RNA's from a set of closest (or explicitly named) organisms as search probes and then tries to guess the ends if the matches are not perfect. search_for_rnas includes a run of tRNAscan-SE.
Configurable options:
| Type of RNA | D = all; a string like "tRNA, 16S, 23S, 5S" | 
| Organism ID | D = none; Give your organism an ID like "EC", "BS" .. | 
| Domain | D = all; phylogenetic domain of organism, A,B,E | 
| Organism Genus | D = all; genus name of organism | 
| Organism Species | D = all; species name of organism | 
Other command line options:
| --probes | D = none; an organism name, or part of | 
| --complete | D = 95; min. pct completeness of probe sequence(s) | 
tRNAscan-SE
From the documentation:
tRNAscan-SE combines the specificity of the Cove probabilistic RNA prediction package (1) with the speed and sensitivity of tRNAscan 1.3 (2) plus an implementation of an algorithm described by Pavesi and colleagues (3), which searches for eukaryotic pol III tRNA promoters (our implementation referred to as EufindtRNA). tRNAscan and EufindtRNA are used as first-pass prefilters to identify "candidate" tRNA regions of the sequence. These subsequences are then passed to Cove for further analysis, and output if Cove confirms the initial tRNA prediction. In this way, tRNAscan-SE attains the best of both worlds: (1) a false positive rate equally low to using Cove analysis, (2) the combined sensitivities of tRNAscan and EufindtRNA (detection of 98-99% of true tRNAs), and (3) search speed 1,000 to 3,000 times faster than Cove analysis and 30 to 90 times faster than the original tRNAscan 1.3 (tRNAscan-SE uses both a code-optimized version of tRNAscan 1.3 which gives a 300-fold increase in speed, and a fast C implementation of the Pavesi et al. algorithm).
Note: The current version of tRNAscan-Se (v 1.21) fails on sequences containing non ATGC characters. Within the reganor pipeline, tRNAscan is only included if the sequences do not contain such characters.
Configurable as additional options:
| -B or -P | search for bacterial tRNAs (use bacterial tRNA model) | 
| -A | search for archaeal tRNAs (use archaeal tRNA model) | 
| -O | search for organellar (mitochondrial/chloroplast) tRNAs | 
| -G | use general tRNA model (cytoplasmic tRNAs from all 3 domains included) | 
| -C | search using covariance model analysis only (max sensitivity, slow) | 
| -H | show both primary and secondary structure components to covariance model bit scores | 
| -D | disable pseudogene checking | 
Specify Alternate Cutoffs / Data Files:
| -X <score> | set cutoff score (in bits) for reporting tRNAs (default=20) | 
| -L <length> | set max length of tRNA intron+variable region (default=116bp) | 
| -I <score> | manually set "intermediate" cutoff score for EufindtRNA | 
| -z <number> | use <number> nucleotides padding when passing first-pass tRNA bounds predictions to CM analysis (default=7) | 
| -g <file> | use alternate genetic codes specified in <file> for determining tRNA type | 
| -c <file> | use an alternate covariance model in <file> | 
Misc Options:
| -h | print this help message | 
| -Q | do not prompt user before overwriting pre-existing result files (for batch processing) | 
| -n <EXPR> | search only sequences with names matching <EXPR> string (<EXPR> may contain * or ? wildcard chars) | 
| -s <EXPR> | start search at sequence with name matching <EXPR> string and continue to end of input sequence file(s) | 
Special Options (for testing & special purposes)
| -T | search using tRNAscan only (defaults to strict params) | 
| -t <mode> | explicitly set tRNAscan params, where <mode>=R or S (R=relaxed, S=strict tRNAscan v1.3 params) | 
| -E | search using Eukaryotic tRNA finder (EufindtRNA) only (defaults to Normal seach parameters when run alone, or to Relaxed search params when run with Cove) | 
| -e <mode> | explicitly set EufindtRNA params, where <mode>=R, N, or S(relaxed, normal, or strict) | 
| -r <file> | save first-pass scan results from EufindtRNA and/or tRNAscan in <file> in tabular results format | 
| -u <file> | search with Cove only those sequences & regions delimited in <file> (tabular results file format) | 
| -F <file> | save first-pass candidate tRNAs in <file> that were then found to be false positives by Cove analysis | 
| -M <file> | save all seqs that do NOT have at least one tRNA prediction in them (aka "missed" seqs) | 
| -v <file> | save verbose tRNAscan 1.3 output to <file> | 
| -V <vers> | run an alternate version of tRNAscan where <vers> = 1.3, 1.39, 1.4 (default), or 2.0 | 
| -K | Keep redundant tRNAscan 1.3 hits (don't filter out multiple predictions per tRNA identification) | 
Getorf
Besides the options which can be chosen directly, additional options for this program which can be specified as 'additional command line options' during tool configuration with the ToolConfigurationWizard:
| -maxsize | integer | Maximum nucleotide size of ORF to report | 
| -find | menu | This is a small menu of possible output options. The first four options are to select either the protein translation or the original nucleic acid sequence of the open reading frame. There are two possible definitions of an open reading frame: it can either be a region that is free of STOP codons or a region that begins with a START codon and ends with a STOP codon. The last three options are probably only of interest to people who wish to investigate the statistical properties of the regions around potential START or STOP codons. The last option assumes that ORF lengths are calculated between two STOP codons. | 
| -[no]methionine | boolean | START codons at the beginning of protein products will usually code for Methionine,despite what the codon will code for when it is internal to a protein. This qualifier sets all such START codons to code for Methionine by default. | 
| -[no]reverse | boolean | Set this to be false if you do not wish to find ORFs in the reverse complement of the sequence. | 
| -flanking | integer | If you have chosen one of the options of the type of sequence to find that gives the flanking sequence around a STOP or START codon, this allows you to set the number of nucleotides either side of that codon to output. If the region of flanking nucleotides crosses the start or end of the sequence, no output is given for this codon. |