GenDBWiki/CoreDocumentation/RegionPrediction: Difference between revisions

From BRF-Software
Jump to navigation Jump to search
imported>AliceMcHardy
No edit summary
imported>AliceMcHardy
No edit summary
Line 53: Line 53:
For the prediction of tRNA genes, tRNAscan-SE is run. For the prediction of RNA genes, SearchForRNAs is run, which
For the prediction of tRNA genes, tRNAscan-SE is run. For the prediction of RNA genes, SearchForRNAs is run, which
uses tRNAscan-SE. Based on the observations from the different CDS and RNA genes are automatically annotated.  
uses tRNAscan-SE. Based on the observations from the different CDS and RNA genes are automatically annotated.  
Besides the Critica predicted CDSs, additional Glimmer(ct) predicted CDSs which do not overlap more than 50bps with these  
Besides the Critica predicted CDSs, additional Glimmer(ct) predicted CDSs which do not overlap more than 50bps with these
are annotated. This way, sensitivity compared to Critica is increased without significantly losing in specificity.  
or RNA genes are annotated. This way, sensitivity compared to Critica is increased without significantly losing in specificity.  
  The reliability of the different predictions is reflected by the 'status region' of these regions. CDSs  
  The reliability of the different predictions is reflected by the 'status region' of these regions. CDSs  
predicted by Critica are assigned 'status 2', additional annotated Glimmer(ct) predictions with high 'Vote
predicted by Critica are assigned 'status 2', additional annotated Glimmer(ct) predictions with high 'Vote
Line 63: Line 63:
Usage: reganor.pl -p <project name> (-r <region> | -a) [-t -f -n] <br>
Usage: reganor.pl -p <project name> (-r <region> | -a) [-t -f -n] <br>
where  <br>
where  <br>
         -p project name - the GenDB project to be run on  
         -p project name - the GenDB project to be run on<br>
         -r region      - Contig to be analyzed  - or - <br>
         -r region      - Contig to be analyzed  - or - <br>
         -a                run on all contigs <br>
         -a                run on all contigs <br>
Line 73: Line 73:


The complete Glimmer package is integrated into GenDB in a comfortable manner. A Glimmer tool optimally
The complete Glimmer package is integrated into GenDB in a comfortable manner. A Glimmer tool optimally
configured according to the evaluation given in [[McHardy]] et al. (Bioinformatics, 2004) can be  
configured according to the evaluation given in McHardy et al. (Bioinformatics, 2004) can be  
created with the default_tool_creation script or simply by using reganor.pl, where this tool
created with the default_tool_creation script or simply by using reganor.pl, where this tool
will be created and run along with the other gene finding programs of the default GenDB gene finding
will be created and run along with the other gene finding programs of the default GenDB gene finding
pipeline. Configuration of individually designed Glimmer tools can be done via the [[ToolConfigurationWizard]]
pipeline. Configuration of individually designed Glimmer tools can be done via the ToolConfigurationWizard  
in the graphical user interface. All options currently configurable are explicitly specified in this
in the graphical user interface. All options currently configurable are explicitly specified in this
interface. Note that there is an additional wrapper around the Glimmer functionality, which does not
interface. Note that there is an additional wrapper around the Glimmer functionality, which does not
Line 125: Line 125:
Configurable options:
Configurable options:


Training tool: Any CDS prediction method for which observations on the specified
{| border="1" cellpadding="2" cellspacing="0"
training sequence exist.
Training tool  
Training sequence: GenDB regions or an external sequence file in FASTA format.
Any CDS prediction method for which observations on the specified training sequence exist.
For external sequences long orfs will be used as training
|-
tool.
Training sequence  
 
GenDB regions or an external sequence file in FASTA format.vcFor external sequences long orfs will be used as training tool.
Look for RBS in start prediction Yes/ No along with pattern to search for.
|-
 
Look for RBS in start prediction  
Minimum gene length: Minimum length of genes to be predicted.
Yes/ No along with pattern to search for.  
 
|-
Linear region: vs. circular, with genes being predicted over the end of
Minimum gene length  
the sequence.
| Minimum length of genes to be predicted.
 
|-
Threshold: Probability score cut-off for prediction. Default is 90
Linear region  
(out of 99).
vs. circular, with genes being predicted over the end of the sequence.
|-
Threshold  
Probability score cut-off for prediction. Default is 90 (out of 99).  
|}


<span id="critica"></span>
<span id="critica"></span>
Line 145: Line 149:


Critica is part of the GenDB default gene finding pipeline usable with the Reganor Wizard.  
Critica is part of the GenDB default gene finding pipeline usable with the Reganor Wizard.  
Different options from the default are configurable versus the [[ToolCreationWizard]].  
Different options from the default are configurable versus the ToolCreationWizard.  
Critica as integrated into GenDB does not allow separation of the training and prediction step. If
Critica as integrated into GenDB does not allow separation of the training and prediction step. If
you want to run it for a set of smaller contigs from one genome, the best way to do this is by 'chaining'  
you want to run it for a set of smaller contigs from one genome, the best way to do this is by 'chaining'  
Line 172: Line 176:
The program "rbs_finder.pl" implements an algorithm to find
The program "rbs_finder.pl" implements an algorithm to find
ribosome  binding  sites (RBS) in the upstream regions of the
ribosome  binding  sites (RBS) in the upstream regions of the
genes annotated by Glimmer2, [[GeneMark]], or other  prokaryotic
genes annotated by Glimmer2, GeneMark, or other  prokaryotic
gene  finders.  If  there  is  no RBS-like patterns in this
gene  finders.  If  there  is  no RBS-like patterns in this
region, program searches for a start codon having a RBS-like
region, program searches for a start codon having a RBS-like
Line 178: Line 182:
and relocates start codon accordingly.
and relocates start codon accordingly.


Explanations for some directly configurable options in the [[ToolConfigurationWizard]]:
Explanations for some directly configurable options in the ToolConfigurationWizard:


Window Size  This parameter  determines  how
{| border="1" cellpadding="2" cellspacing="0"
            far  the  program  should look for RBS-like pattern  in
Window Size
            the upstream region of each  of  the  genes.  The  best
| This parameter  determines  how far  the  program  should look for RBS-like pattern  in the upstream region of each  of  the  genes.  The  best results  obtained  using a window size of 50bps.  
            results  obtained  using a window size of 50bps.
|-
 
Iterations
Iterations   RBSsfinder achieves better results if run iteratively.
|  BSsfinder achieves better results if run iteratively.  
|}


RBSFinder options configurable as 'other command-line options':
RBSFinder options configurable as 'other command-line options':
The detault sequence  is ("aggag"). However, a computed sequence can be used to get better results. The  method  to  compute the  consensus sequence is as follows:           
-Take the complement of last 30bps of  16S  rRNA  <br>       
- Find the most abundantly found 5bps subsequence of this complement in the 30bps upstream regions of  the  start codons  annotated  by  Glimmer2.            <br>
-Use this sequence as consensus sequence. <br>


    <Consensus_seq>: The default  consensus
{| border="1" cellpadding="2" cellspacing="0"
          sequence  is ("aggag"). However a computed sequence can
|<Partial_Coord_File>|
          be used to get better results. The  method  to  compute
The  coordinates that  user  wants  to  relocate  or check for RBS site,which can be  a  subset  of  coordinates  annotated  by Glimmer2.This file should be in following format:  
          the  consensus sequence is as follows:            -Take
|}
          the complement of last 30bps of  16S  rRNA            -
          Find the most abundantly found 5bps subsequence of this
          complement in the 30bps upstream regions of  the  start
          codons  annotated  by  Glimmer2.            -Use  this
          sequence as consensus sequence.
    <Partial_Coord_File>:The  coordinates
          that  user  wants  to  relocate  or check for RBS site,
          which can be  a  subset  of  coordinates  annotated  by
          Glimmer2.This file should be in following format:


              <Gene id> <Start Codon Coord> <Stop Codon Coord>
            || <Gene id> || <Start Codon Coord> || <Stop Codon Coord> ||
              1         1030           1140
            || 1         || 1030   ||      1140 ||
              2         1214           3010
            || 2         || 1214   ||      3010 ||


<span id="sfr"></span>
<span id="sfr"></span>
Line 218: Line 218:


Configurable options:
Configurable options:
Type of RNA       D = all; a string like "tRNA, 16S, 23S, 5S"
{| border="1" cellpadding="2" cellspacing="0"
Organism ID    D = none; Give your organism an ID like "EC", "BS" ..  
Type of RNA  
Domain         D = all; phylogenetic domain of organism, A,B,E
D = all; a string like "tRNA, 16S, 23S, 5S"  
Organism Genus     D = all; genus name of organism
|-
Organism Species D = all; species name of organism
Organism ID  
|     D = none; Give your organism an ID like "EC", "BS" ..
|-
Domain
|        D = all; phylogenetic domain of organism, A,B,E  
|-
Organism Genus
|    D = all; genus name of organism  
|-
Organism Species  
D = all; species name of organism  
|}


Other command line options:
Other command line options:
   --probes     D = none; an organism name, or part of
{| border="1" cellpadding="2" cellspacing="0"
   --complete   D = 95; min. pct completeness of probe sequence(s)
|   --probes  
|    D = none; an organism name, or part of  
|-
|   --complete
D = 95; min. pct completeness of probe sequence(s)  
|}


=== tRNAscan-SE ===
=== tRNAscan-SE ===
Line 254: Line 270:
Configurable as additional options:
Configurable as additional options:


   -B or -P  : search for bacterial tRNAs (use bacterial tRNA model)
{| border="1" cellpadding="2" cellspacing="0"
   -A        : search for archaeal tRNAs    (use archaeal tRNA model)
|   -B or -P   
   -O        : search for organellar (mitochondrial/chloroplast) tRNAs
search for bacterial tRNAs (use bacterial tRNA model)  
   -G        : use general tRNA model (cytoplasmic tRNAs from all 3 domains included)
|-
 
|   -A         
   -C        : search using covariance model analysis only (max sensitivity, slow)
search for archaeal tRNAs    (use archaeal tRNA model)  
 
|-
  -H        : show both primary and secondary structure components to
|   -O         
              covariance model bit scores
search for organellar (mitochondrial/chloroplast) tRNAs  
   -D        : disable pseudogene checking
|-
|   -G         
use general tRNA model (cytoplasmic tRNAs from all 3 domains included)  
|-
|   -C         
search using covariance model analysis only (max sensitivity, slow)  
|-
-H         
show both primary and secondary structure components to covariance model bit scores  
|-
|   -D         
disable pseudogene checking  
|}


Specify Alternate Cutoffs / Data Files:
Specify Alternate Cutoffs / Data Files:


   -X <score> : set cutoff score (in bits) for reporting tRNAs (default=20)
{| border="1" cellpadding="2" cellspacing="0"
   -L <length>: set max length of tRNA intron+variable region (default=116bp)
|   -X <score>  
set cutoff score (in bits) for reporting tRNAs (default=20)  
|-
|   -L <length>  
set max length of tRNA intron+variable region (default=116bp)  
|}


   -I <score>  : manually set "intermediate" cutoff score for EufindtRNA
{| border="1" cellpadding="2" cellspacing="0"
   -z <number> : use <number> nucleotides padding when passing first-pass
|   -I <score>   
                tRNA bounds predictions to CM analysis (default=7)
manually set "intermediate" cutoff score for EufindtRNA  
|-
|   -z <number>  
use <number> nucleotides padding when passing first-pass tRNA bounds predictions to CM analysis (default=7)  
|}


   -g <file>  : use alternate genetic codes specified in <file> for
{| border="1" cellpadding="2" cellspacing="0"
                determining tRNA type
|   -g <file>   
   -c <file>  : use an alternate covariance model in <file>
use alternate genetic codes specified in <file> for determining tRNA type  
|-
|   -c <file>   
use an alternate covariance model in <file>  
|}


Misc Options:
Misc Options:


   -h        : print this help message
{| border="1" cellpadding="2" cellspacing="0"
  -Q        : do not prompt user before overwriting pre-existing
|   -h         
              result files  (for batch processing)
print this help message  
|-
-Q         
do not prompt user before overwriting pre-existing result files  (for batch processing)  
|-
|  -n <EXPR> 
|  search only sequences with names matching <EXPR> string  (<EXPR> may contain * or ? wildcard chars)
|-
|  -s <EXPR> 
|  start search at sequence with name matching <EXPR> string and continue to end of input sequence file(s)
|}


  -n <EXPR>  : search only sequences with names matching <EXPR> string
                (<EXPR> may contain * or ? wildcard chars)
  -s <EXPR>  : start search at sequence with name matching <EXPR> string
                and continue to end of input sequence file(s)
Special Options (for testing & special purposes)
Special Options (for testing & special purposes)


  -T          : search using tRNAscan only (defaults to strict params)
{| border="1" cellpadding="2" cellspacing="0"
  -t <mode>  : explicitly set tRNAscan params, where <mode>=R or S
-T           
                (R=relaxed, S=strict tRNAscan v1.3 params)
search using tRNAscan only (defaults to strict params)  
 
|-
  -E          : search using Eukaryotic tRNA finder (EufindtRNA) only
-t <mode>   
                (defaults to Normal seach parameters when run alone,
explicitly set tRNAscan params, where <mode>=R or S (R=relaxed, S=strict tRNAscan v1.3 params)  
                      or to Relaxed search params when run with Cove)
|-
  -e <mode>  : explicitly set EufindtRNA params, where <mode>=R, N, or S
-E           
                (relaxed, normal, or strict)
search using Eukaryotic tRNA finder (EufindtRNA) only (defaults to Normal seach parameters when run alone, or to Relaxed search params when run with Cove)  
|-
-e <mode>   
explicitly set EufindtRNA params, where <mode>=R, N, or S(relaxed, normal, or strict)  
|}


  -r <file>  : save first-pass scan results from EufindtRNA and/or
{| border="1" cellpadding="2" cellspacing="0"
                tRNAscan in <file> in tabular results format
-r <file>   
  -u <file>  : search with Cove only those sequences & regions delimited
save first-pass scan results from EufindtRNA and/or tRNAscan in <file> in tabular results format  
                in <file> (tabular results file format)
|-
   -F <file>  : save first-pass candidate tRNAs in <file> that were then
|    -u <file>   
                found to be false positives by Cove analysis
search with Cove only those sequences & regions delimited in <file> (tabular results file format)  
-M <file>  : save all seqs that do NOT have at least one
|-
                tRNA prediction in them (aka "missed" seqs)
|   -F <file>   
  -v <file>  : save verbose tRNAscan 1.3 output to <file>
save first-pass candidate tRNAs in <file> that were then found to be false positives by Cove analysis  
   -V <vers>  : run an alternate version of tRNAscan
|-
                where <vers> = 1.3, 1.39, 1.4 (default), or 2.0
-M <file>   
  -K          : Keep redundant tRNAscan 1.3 hits (don't filter out multiple
save all seqs that do NOT have at least one tRNA prediction in them (aka "missed" seqs)  
                predictions per tRNA identification)
|-
-v <file>   
save verbose tRNAscan 1.3 output to <file>  
|-
|   -V <vers>   
run an alternate version of tRNAscan where <vers> = 1.3, 1.39, 1.4 (default), or 2.0  
|-
-K           
Keep redundant tRNAscan 1.3 hits (don't filter out multiple predictions per tRNA identification)  
|}


<span id="getorf"></span>
<span id="getorf"></span>
Line 319: Line 379:
Besides the options which can be chosen directly, additional options for this program which  
Besides the options which can be chosen directly, additional options for this program which  
can be specified as 'additional command line options' during tool configuration
can be specified as 'additional command line options' during tool configuration
with the [[ToolConfigurationWizard]]:
with the ToolConfigurationWizard:
 
  -maxsize            integer    Maximum nucleotide size of ORF to report
  -find              menu      This is a small menu of possible output
                                  options. The first four options are to
                                  select either the protein translation or the
                                  original nucleic acid sequence of the open
                                  reading frame. There are two possible
                                  definitions of an open reading frame: it can
                                  either be a region that is free of STOP
                                  codons or a region that begins with a START
                                  codon and ends with a STOP codon. The last
                                  three options are probably only of interest
                                  to people who wish to investigate the
                                  statistical properties of the regions around
                                  potential START or STOP codons. The last
                                  option assumes that ORF lengths are
                                  calculated between two STOP codons.


  -[no]methionine     boolean    START codons at the beginning of protein
{| border="1" cellpadding="2" cellspacing="0"
                                  products will usually code for Methionine,
| -maxsize   
                                  despite what the codon will code for when it
|      integer 
                                  is internal to a protein. This qualifier
|  Maximum nucleotide size of ORF to report
                                  sets all such START codons to code for
|-
                                  Methionine by default.
|  -find       
  -[no]reverse       boolean   Set this to be false if you do not wish to
|      menu   
                                  find ORFs in the reverse complement of the
|  This is a small menu of possible output options. The first four options are to select either the protein translation or the original nucleic acid sequence of the open reading frame. There are two possible definitions of an open reading frame: it can either be a region that is free of STOP codons or a region that begins with a START codon and ends with a STOP codon. The last three options are probably only of interest to people who wish to investigate the statistical properties of the regions around potential START or STOP codons. The last option assumes that ORF lengths are calculated between two STOP codons.
                                  sequence.
|-
  -flanking           integer   If you have chosen one of the options of the
-[no]methionine  
                                  type of sequence to find that gives the
boolean  
                                  flanking sequence around a STOP or START
|   START codons at the beginning of protein products will usually code for Methionine,despite what the codon will code for when it is internal to a protein. This qualifier sets all such START codons to code for Methionine by default.  
                                  codon, this allows you to set the number of
|-
                                  nucleotides either side of that codon to
-[no]reverse  
                                  output. If the region of flanking
|      boolean  
                                  nucleotides crosses the start or end of the
Set this to be false if you do not wish to find ORFs in the reverse complement of the sequence.  
                                  sequence, no output is given for this codon.
|-
-flanking    
|      integer  
If you have chosen one of the options of the type of sequence to find that gives the flanking sequence around a STOP or START codon, this allows you to set the number of nucleotides either side of that codon to output. If the region of flanking nucleotides crosses the start or end of the sequence, no output is given for this codon.

Revision as of 11:32, 23 November 2004

Documentation for the GenDB Gene Finding Components

Currently integrated programs

Program Task Reference
REGANOR Prediction of CDSs (http://bioinformatics.oupjournals.org/cgi/reprint/20/10/1622)
Glimmer-2.1 Prediction of CDSs (http://nar.oupjournals.org/cgi/content/full/27/23/4636)
Critica Prediction of CDSs, RBSs, Frameshifts (http://mbe.oupjournals.org/cgi/reprint/16/4/512)
RBSFinder Relocation of CDS starts, Prediction of RBSs (http://bioinformatics.oupjournals.org/cgi/reprint/17/12/1123)
SearchforRNAs Prediction of t- and rRNAs (Niels Larsen, University Aarhus, Denmark)
tRNAscan-SE Prediction of tRNAs (http://nar.oupjournals.org/cgi/content/full/25/5/955)
Getorf Lists all ORFs (Contained in EMBOSS package, http://www.hgmp.mrc.ac.uk/Software/EMBOSS/)

Documentation for individual programs

In the following sections you can find detailed descriptions for all programs that can be used to predict regions within GenDB.

REGANOR

Reganor is a pipeline which automates the complete gene finding procedure for a sequence within GenDB. For CDS prediction, a combined strategy based on the gene finders Glimmer and Critica applied. Glimmer is run using the Critica predictions on the sequence as a training set of CDSs. This in an extensive evaluation on 113 microbial genome sequences was shown to have a significantly improved overall performance compared to the Glimmer standard application, especially for GC rich genomes. For the prediction of tRNA genes, tRNAscan-SE is run. For the prediction of RNA genes, SearchForRNAs is run, which uses tRNAscan-SE. Based on the observations from the different CDS and RNA genes are automatically annotated. Besides the Critica predicted CDSs, additional Glimmer(ct) predicted CDSs which do not overlap more than 50bps with these or RNA genes are annotated. This way, sensitivity compared to Critica is increased without significantly losing in specificity.

The reliability of the different predictions is reflected by the 'status region' of these regions. CDSs 

predicted by Critica are assigned 'status 2', additional annotated Glimmer(ct) predictions with high 'Vote Scores' are assigned 'status 1', low scoring additional Glimmer(ct)-based annotations are assigned 'attention needed', as besides some true positive predictions the latter is likely to contain also some false positive predictions.

Usage: reganor.pl -p <project name> (-r <region> | -a) [-t -f -n]
where

       -p project name - the GenDB project to be run on
-r region - Contig to be analyzed - or -
-a run on all contigs
-n Do not run autoannotation
-f restart failed, submitted or finished jobs

GLIMMER-2.1

The complete Glimmer package is integrated into GenDB in a comfortable manner. A Glimmer tool optimally configured according to the evaluation given in McHardy et al. (Bioinformatics, 2004) can be created with the default_tool_creation script or simply by using reganor.pl, where this tool will be created and run along with the other gene finding programs of the default GenDB gene finding pipeline. Configuration of individually designed Glimmer tools can be done via the ToolConfigurationWizard in the graphical user interface. All options currently configurable are explicitly specified in this interface. Note that there is an additional wrapper around the Glimmer functionality, which does not recognize the options which can be given directly to the glimmer2 program, so do not specify such options as 'other command line options'. What follows are extracts from the Glimmer documentation and comments on how the programs are integrated within the GenDB system:

Glimmer 1.0 had 4 read me files, and Glimmer 2.0 maintains that structure. The four main programs are:

  1. long-orfs
  2. extract
  3. build-icm
  4. glimmer2

1. Program long-orfs takes a sequence file (in FASTA format) and outputs a list of all long "potential genes" in it that do not overlap by too much. By "potential gene" I mean the portion of an orf from the first start codon to the stop codon at the end.

2. Program extract takes a FASTA format sequence file and a file with a list of start/stop positions in that file (e.g., as produced by the long-orfs program) and extracts and outputs the specified sequences.

3. Program build-icm.c creates and outputs an interpolated Markov model (IMM) as described in the paper

 A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg.
 Improved Microbial Gene Identification with Glimmer.  
 Nucleic Acids Research, 1999, in press.

4. Program glimmer takes two inputs: a sequence file (in FASTA format) and a collection of Markov models for genes as produced by the program build-icm . It outputs a list of all open reading frames (orfs) together with scores for each as a gene.

Comment: Programs 1-3 are used to create a model of CDS properties based on putative CDSs derived from an input set of sequences. In GenDB, these are called the 'training sequences' and can be specified during configuration (e.g. if multiple contigs belonging to one genome). By 'training tool', the method of how the putative CDSs are extracted from these sequences is defined. If not choosing a 'training tool', the Glimmer default 'long-orfs' is applied. The REGANOR wizard uses the gene finder Critica as the default 'training tool'. During the Glimmer tool run, glimmer2 is then used with the ICM-model created in the training phase on the sequence specified to be analyzed.

Output: The extended output resulting from a Glimmer run contains for every prediction three kind of scores: a probability, a 'Vote Score' (stored as the 'result') and a 'Raw Score' (stored in GenDB along with other comments from the Glimmer output as 'comment').

Configurable options:

Training tool Any CDS prediction method for which observations on the specified training sequence exist.
Training sequence GenDB regions or an external sequence file in FASTA format.vcFor external sequences long orfs will be used as training tool.
Look for RBS in start prediction Yes/ No along with pattern to search for.
Minimum gene length Minimum length of genes to be predicted.
Linear region vs. circular, with genes being predicted over the end of the sequence.
Threshold Probability score cut-off for prediction. Default is 90 (out of 99).

CRITICA

Critica is part of the GenDB default gene finding pipeline usable with the Reganor Wizard. Different options from the default are configurable versus the ToolCreationWizard. Critica as integrated into GenDB does not allow separation of the training and prediction step. If you want to run it for a set of smaller contigs from one genome, the best way to do this is by 'chaining' the contigs together with linkers (containing stops in every frame) during the import into the GenDB system.

Output: Observations resulting from a Critica run includes predicted CDSs, predicted ribosome binding sites and predictions of possible frameshifts. The value stored under 'results' for the CDS predictions in GenDB is the P-value given in the Critica output: The P-value is the amount of statistical support for the coding region. Like BLAST, low scores are better.

Configurable options:

Look for RBS in start detection Use ribosome binding pattern to choose start

Note: If Critica does not detect any CDSs, it dies with an error message, although this is not really an error -- the result is that the corresponding GenDB job is assigned the state 'failed. This is most likely to happen if you are trying to analyze very short sequences <= 10000bps with Critica, which is not really what the program is intended for.

RBSFinder

From the documentation: The program "rbs_finder.pl" implements an algorithm to find ribosome binding sites (RBS) in the upstream regions of the genes annotated by Glimmer2, GeneMark, or other prokaryotic gene finders. If there is no RBS-like patterns in this region, program searches for a start codon having a RBS-like pattern,in the same reading frame upstream or downstream and relocates start codon accordingly.

Explanations for some directly configurable options in the ToolConfigurationWizard:

Window Size This parameter determines how far the program should look for RBS-like pattern in the upstream region of each of the genes. The best results obtained using a window size of 50bps.
Iterations BSsfinder achieves better results if run iteratively.

RBSFinder options configurable as 'other command-line options':

The detault sequence  is ("aggag"). However, a computed sequence can be used to get better results. The  method  to  compute the  consensus sequence is as follows:            
-Take the complement of last 30bps of  16S  rRNA  
- Find the most abundantly found 5bps subsequence of this complement in the 30bps upstream regions of the start codons annotated by Glimmer2.
-Use this sequence as consensus sequence.
The coordinates that user wants to relocate or check for RBS site,which can be a subset of coordinates annotated by Glimmer2.This file should be in following format:
            || <Gene id> || <Start Codon Coord>  || <Stop Codon Coord> ||
            || 1         || 1030    ||       1140 ||
            || 2         || 1214    ||       3010 ||

SearchforRNAs

From the documentation: Searches a fasta formatted contig file for RNA's and prints tbl style output. It works by using the RNA's from a set of closest (or explicitly named) organisms as search probes and then tries to guess the ends if the matches are not perfect. search_for_rnas includes a run of tRNAscan-SE.

Configurable options:

Type of RNA D = all; a string like "tRNA, 16S, 23S, 5S"
Organism ID D = none; Give your organism an ID like "EC", "BS" ..
Domain D = all; phylogenetic domain of organism, A,B,E
Organism Genus D = all; genus name of organism
Organism Species D = all; species name of organism

Other command line options:

--probes D = none; an organism name, or part of
--complete D = 95; min. pct completeness of probe sequence(s)

tRNAscan-SE

From the documentation:

tRNAscan-SE combines the specificity of the Cove probabilistic RNA prediction package (1) with the speed and sensitivity of tRNAscan 1.3 (2) plus an implementation of an algorithm described by Pavesi and colleagues (3), which searches for eukaryotic pol III tRNA promoters (our implementation referred to as EufindtRNA). tRNAscan and EufindtRNA are used as first-pass prefilters to identify "candidate" tRNA regions of the sequence. These subsequences are then passed to Cove for further analysis, and output if Cove confirms the initial tRNA prediction. In this way, tRNAscan-SE attains the best of both worlds: (1) a false positive rate equally low to using Cove analysis, (2) the combined sensitivities of tRNAscan and EufindtRNA (detection of 98-99% of true tRNAs), and (3) search speed 1,000 to 3,000 times faster than Cove analysis and 30 to 90 times faster than the original tRNAscan 1.3 (tRNAscan-SE uses both a code-optimized version of tRNAscan 1.3 which gives a 300-fold increase in speed, and a fast C implementation of the Pavesi et al. algorithm).

Note: The current version of tRNAscan-Se (v 1.21) fails on sequences containing non ATGC characters. Within the reganor pipeline, tRNAscan is only included if the sequences do not contain such characters.

Configurable as additional options:

-B or -P search for bacterial tRNAs (use bacterial tRNA model)
-A search for archaeal tRNAs (use archaeal tRNA model)
-O search for organellar (mitochondrial/chloroplast) tRNAs
-G use general tRNA model (cytoplasmic tRNAs from all 3 domains included)
-C search using covariance model analysis only (max sensitivity, slow)
-H show both primary and secondary structure components to covariance model bit scores
-D disable pseudogene checking

Specify Alternate Cutoffs / Data Files:

-X <score> set cutoff score (in bits) for reporting tRNAs (default=20)
-L <length> set max length of tRNA intron+variable region (default=116bp)
-I <score> manually set "intermediate" cutoff score for EufindtRNA
-z <number> use <number> nucleotides padding when passing first-pass tRNA bounds predictions to CM analysis (default=7)
-g <file> use alternate genetic codes specified in <file> for determining tRNA type
-c <file> use an alternate covariance model in <file>

Misc Options:

-h print this help message
-Q do not prompt user before overwriting pre-existing result files (for batch processing)
-n <EXPR> search only sequences with names matching <EXPR> string (<EXPR> may contain * or ? wildcard chars)
-s <EXPR> start search at sequence with name matching <EXPR> string and continue to end of input sequence file(s)

Special Options (for testing & special purposes)

-T search using tRNAscan only (defaults to strict params)
-t <mode> explicitly set tRNAscan params, where <mode>=R or S (R=relaxed, S=strict tRNAscan v1.3 params)
-E search using Eukaryotic tRNA finder (EufindtRNA) only (defaults to Normal seach parameters when run alone, or to Relaxed search params when run with Cove)
-e <mode> explicitly set EufindtRNA params, where <mode>=R, N, or S(relaxed, normal, or strict)
-r <file> save first-pass scan results from EufindtRNA and/or tRNAscan in <file> in tabular results format
-u <file> search with Cove only those sequences & regions delimited in <file> (tabular results file format)
-F <file> save first-pass candidate tRNAs in <file> that were then found to be false positives by Cove analysis
-M <file> save all seqs that do NOT have at least one tRNA prediction in them (aka "missed" seqs)
-v <file> save verbose tRNAscan 1.3 output to <file>
-V <vers> run an alternate version of tRNAscan where <vers> = 1.3, 1.39, 1.4 (default), or 2.0
-K Keep redundant tRNAscan 1.3 hits (don't filter out multiple predictions per tRNA identification)

Getorf

Besides the options which can be chosen directly, additional options for this program which can be specified as 'additional command line options' during tool configuration with the ToolConfigurationWizard:

-maxsize integer Maximum nucleotide size of ORF to report
-find menu This is a small menu of possible output options. The first four options are to select either the protein translation or the original nucleic acid sequence of the open reading frame. There are two possible definitions of an open reading frame: it can either be a region that is free of STOP codons or a region that begins with a START codon and ends with a STOP codon. The last three options are probably only of interest to people who wish to investigate the statistical properties of the regions around potential START or STOP codons. The last option assumes that ORF lengths are calculated between two STOP codons.
-[no]methionine boolean START codons at the beginning of protein products will usually code for Methionine,despite what the codon will code for when it is internal to a protein. This qualifier sets all such START codons to code for Methionine by default.
-[no]reverse boolean Set this to be false if you do not wish to find ORFs in the reverse complement of the sequence.
-flanking integer If you have chosen one of the options of the type of sequence to find that gives the flanking sequence around a STOP or START codon, this allows you to set the number of nucleotides either side of that codon to output. If the region of flanking nucleotides crosses the start or end of the sequence, no output is given for this codon.