GenDBWiki/TermsAndConcepts/AnnotationConcepts

From BRF-Software
Jump to navigation Jump to search

Details about specific GenDB Concepts

The following sections illustrate some specific concepts that are implemented within the GenDB genome annotation system. Before you start annotating a genome you should read these sections in order to understand how the system works with your data and how some information is stored systematically to facilitate further analyses.

Annotation Concepts

In general, annotations are used within the GenDB framework to store information about a region. They are either created by an automatic annotation step or by a human annotator. In contrast to observations that can be recomputed on demand, an annotation is never deleted. Instead, all annotations are stored with a timestamp in chronological order. Storing such a history of annotations allows you to reproduce how the collected information about a region has evolved over time. Furthermore, the GenDB system distinguishes to different types of annotations: Region annotations are used to describe why, how, and when a region (e.g. a CDS) was created or modified. Function annotations explain the functional role of a region and the (potential) tasks that a region is involved in within an organism. For the annotation of coding sequences, the function annotation is used to characterize a gene: A gene name, gene product, a description, a functional category and other information can be assigned creating a new annotation. In addition to the history of all annotations, each region can refer a single region annotation and a function annotation as the latest_annotation. This latest_annotation_region / latest_annotation_function contains the current valid annotation which is represented to the user when a region is selected. This is usually the latest annotation (by date) but not necessarily (recomputing an automatic annotation can set the latest annotation or not). Whenever a region is exported (e.g. EMBL or GenBank export) only the latest annotation is used by default for data generation. It is also important to notice that the status of a region is usually only changed by creating a new annotation in order to protocol how the status evolved.

Observation Levels

Within the GenDB annotation system all results from bioinformatics tools (e.g. Glimmer gene predictions or alignments from BLAST runs) are called observations. In order to allow a comparison of results computed by different tools (e.g. by BLAST and Pfam) all observations can be grouped into levels ranging from 1 (high) to 5 (low) according to their quality. As an example, this could mean that level 1 is assigned to all BLAST results with an e-value lower than 1E-50 while the same level is assigned to Pfam observations only if they have an e-value lower than 1E-80. Since GenDB version 2.2 the corresponding ranges for assigning the level to an observation can be set individually for each tool instance (e.g. you can assign different levels for a BLAST vs. SwissProt and a BLAST vs. EMBL search).

Status Region vs. Status Function

The region status can be used to indicate the progress of an ongoing annotation with respect to the reliability of each region. A specific region status could be set to indicate a potential frameshift or wrong gene start. Setting a special region status could also be used to distinguish the predictions from different gene finding tools, e.g. Glimmer and Critica. The region status can be one of the following:

  • ignored: Setting this status will ignore a region for all further analysis and also for all exports, e.g. to an EMBL file. Instead of deleting a region which is usually not supported by GenDB, the status should be set to ignored so that other annotators can still see that region and know immediately that someone already checked and discarded that region.
  • putative: This status is assigned to regions that were just initially created, e.g. by a human annotator.
  • attention needed: This status indicates that the region annotation of this region is not reliable and needs to be revised.
  • status 1: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
  • status 2: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
  • status 3: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
  • finished: The region and function annotation of this region was assured by a human annotator. This region needs no more work and probably there is also some experimental evidence that confirms the annotation.

The functional status can be used to indicate the status of a region during an annotation with respect to the information content of a functional annotation. Whenever the function of a region is annotated a status from the following list should be assigned:

  • putative: All regions are initially set to this status after creation. There is neither an automatic nor a manual function annotation.
  • attention needed: This status indicates that the functional annotation of this region is not reliable and needs to be revised.
  • automatically annotated: The function of this region was automatically annotated, e.g. by Metanor. In the beginning of a manual annotation, most of the regions will have this function status.
  • status 1: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
  • status 2: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
  • status 3: This status can be used to assign an individual property defined specifically for each project (sometimes unused).
  • annotated: The function of this region has been verified by a human annotator. This implies that the annotation of such a region is done and therefore the region status will be usually set to finished.