GenDBWiki/TermsAndConcepts/AnnotationConcepts: Difference between revisions

From BRF-Software
Jump to navigation Jump to search
imported>AlexanderGoesmann
No edit summary
imported>AlexanderGoesmann
No edit summary
Line 1: Line 1:
__NOTOC__
__NOTOC__
= Details about the GenDB Annotation Concept =
= Details about specific GenDB Concepts =


* Annotation Region vs. Annotation Function
The following sections illustrate some specific concepts that are implemented within the GenDB genome annotation system. Before you start annotating a genome you should read these sections in order to understand how the system works with your data and how some information is stored systematically to facilitate further analyses.
* never delete
 
* status function vs. status region
== Annotation Concepts ==
* latest annotation function and region
 
In general, annotations are used within the GenDB framework to store information about a region. They are either created by an automatic annotation step or by a human annotator. In contrast to observations that can be recomputed on demand, an annotation is '''never deleted'''. Instead, all annotations are stored with a timestamp in chronological order. Storing such a history of annotations allows you to reproduce how the collected information about a region has evolved over time. Furthermore, the GenDB system distinguishes to different types of annotations: '''Region annotations''' are used to describe why, how, and when a region (e.g. a CDS) was created or modified. '''Function annotations''' explain the functional role of a region and the (potential) tasks that a region is involved in within an organism. For the annotation of coding sequences, the function annotation is used to characterize a gene: A gene name, gene product, a description, a functional category and other information can be assigned creating a new annotation. In addition to the history of all annotations, each region can refer a single region annotation and a function annotation as the latest_annotation. This '''latest_annotation_region''' / '''latest_annotation_function''' contains the current valid annotation which is represented to the user when a region is selected. This is usually the latest annotation (by date) but not necessarily (recomputing an automatic annotation can set the latest annotation or not). Whenever a region is exported (e.g. EMBL or GenBank export) only the latest annotation is used by default for data generation. It is also important to notice that the status of a region is usually only changed by creating a new annotation in order to protocol how the status evolved.
 
== Observation Levels ==
 
Within the GenDB annotation system all results from bioinformatics tools (e.g. Glimmer gene predictions or alignments from BLAST runs) are called observations. In order to allow a comparison of results computed by different tools (e.g. by BLAST and Pfam) all observations can be grouped into levels ranging from 1 (high) to 5 (low) according to their quality. As an example, this could mean that level 1 is assigned to all BLAST results with an e-value lower than 1E-50 while the same level is assigned to Pfam observations only if they have an e-value lower than 1E-80. Since GenDB version 2.2 the corresponding ranges for assigning the level to an observation can be set individually for each tool instance (e.g. you can assign different levels for a BLAST vs. SwissProt and a BLAST vs. EMBL search).
 
== Status Region vs. Status Function ==
 
Set regional status to after checking the region, one of the following stati should be assigned to this region: "
 
* ignored: ignore this region
* putative: additional status. Do not use this one
* attention needed: the status of this region is still not confident and needs to be revised.
* status 1:
* status 2:
* status 3: the function of these stati can be defined independently in each project.
* finished: this region is assured and does not need to be reprocessed.
 
Set functional status to after checking the function, one of the following stati should be assigned to this region:
 
* putative: initial status. There has been neither automatic nor manual annota- tion.  
* attention needed: the functional status of this region is still not confident and needs to be revised.
* automatically annotated: This function of this region has been automatically annotated, e.g. by Metanor. In the beginning of a manual annotation, most of the regions will have this function status.
* status 1: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
* status 2: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
* status 3: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
* annotated: The function of this region has been verified by a human annotator. This implies that the annotation of such a region is done and therefore the region status will be usually set to ''finished''.

Revision as of 11:44, 1 December 2004

Details about specific GenDB Concepts

The following sections illustrate some specific concepts that are implemented within the GenDB genome annotation system. Before you start annotating a genome you should read these sections in order to understand how the system works with your data and how some information is stored systematically to facilitate further analyses.

Annotation Concepts

In general, annotations are used within the GenDB framework to store information about a region. They are either created by an automatic annotation step or by a human annotator. In contrast to observations that can be recomputed on demand, an annotation is never deleted. Instead, all annotations are stored with a timestamp in chronological order. Storing such a history of annotations allows you to reproduce how the collected information about a region has evolved over time. Furthermore, the GenDB system distinguishes to different types of annotations: Region annotations are used to describe why, how, and when a region (e.g. a CDS) was created or modified. Function annotations explain the functional role of a region and the (potential) tasks that a region is involved in within an organism. For the annotation of coding sequences, the function annotation is used to characterize a gene: A gene name, gene product, a description, a functional category and other information can be assigned creating a new annotation. In addition to the history of all annotations, each region can refer a single region annotation and a function annotation as the latest_annotation. This latest_annotation_region / latest_annotation_function contains the current valid annotation which is represented to the user when a region is selected. This is usually the latest annotation (by date) but not necessarily (recomputing an automatic annotation can set the latest annotation or not). Whenever a region is exported (e.g. EMBL or GenBank export) only the latest annotation is used by default for data generation. It is also important to notice that the status of a region is usually only changed by creating a new annotation in order to protocol how the status evolved.

Observation Levels

Within the GenDB annotation system all results from bioinformatics tools (e.g. Glimmer gene predictions or alignments from BLAST runs) are called observations. In order to allow a comparison of results computed by different tools (e.g. by BLAST and Pfam) all observations can be grouped into levels ranging from 1 (high) to 5 (low) according to their quality. As an example, this could mean that level 1 is assigned to all BLAST results with an e-value lower than 1E-50 while the same level is assigned to Pfam observations only if they have an e-value lower than 1E-80. Since GenDB version 2.2 the corresponding ranges for assigning the level to an observation can be set individually for each tool instance (e.g. you can assign different levels for a BLAST vs. SwissProt and a BLAST vs. EMBL search).

Status Region vs. Status Function

Set regional status to after checking the region, one of the following stati should be assigned to this region: "

  • ignored: ignore this region
  • putative: additional status. Do not use this one
  • attention needed: the status of this region is still not confident and needs to be revised.
  • status 1:
  • status 2:
  • status 3: the function of these stati can be defined independently in each project.
  • finished: this region is assured and does not need to be reprocessed.

Set functional status to after checking the function, one of the following stati should be assigned to this region:

  • putative: initial status. There has been neither automatic nor manual annota- tion.
  • attention needed: the functional status of this region is still not confident and needs to be revised.
  • automatically annotated: This function of this region has been automatically annotated, e.g. by Metanor. In the beginning of a manual annotation, most of the regions will have this function status.
  • status 1: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
  • status 2: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
  • status 3: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
  • annotated: The function of this region has been verified by a human annotator. This implies that the annotation of such a region is done and therefore the region status will be usually set to finished.