MGE-PortalWiki/BLAST: Difference between revisions

From BRF-Software
Jump to navigation Jump to search
No edit summary
 
m (50 revisions)
 
(49 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__NOTOC__
__NOTOC__
= Information about Sequence Databases
= Information about Sequence Databases =


== General Databases ==
The Center of Biotechnology hosts a wide range of public sequence ressources plus several specialized in-house databases. If you think something is missing or you have suggestions for improvements or special requirements, do not hesitate to contact [mailto:mg-bielefeld@cebitec.uni-bielefeld.de].


""Nukleotide sequence databases:""
== General Sequence Databases ==


nt:
'''Nucleotide Sequence Databases:'''
nucleotide sequence database, with entries from all traditional divisions of [[GenBank]], EMBL, and DDBJ excluding bulk divisions (gss, sts, pat, est, and htg divisions. wgs entries are also excluded. Not non-redundant.


est:
''nt:'' nucleotide sequence database, with entries from all traditional divisions of GenBank, EMBL, and DDBJ excluding bulk divisions (gss (genome survey sequences), sts (sequence tagged sites), pat (patent), est (expressed sequence tags), and htgs (high throughput genome sequences) divisions). wgs (whole genome shotgun) entries are also excluded. Not non-redundant.
EST division of [[GenBank]], EMBL,and DDBJ


gss:
''est:'' EST division of GenBank, EMBL,and DDBJ
GSS division of [[GenBank]], EMBL, and DDBJ


htgs:
''gss:'' GSS division of GenBank, EMBL,and DDBJ
HTG division of [[GenBank]], EMBL, and DDBJ


env_nt
''htgs:'' HTG division of GenBank, EMBL,and DDBJ
Enviromental samples


-------
''env_nt:'' Nucleotide enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples (whole genome shotgun sequences).


""Protein sequence databases:""
''other_genomic:'' RefSeq chromosome records for organisms other than human


nr:
''pdbnt:'' nucleotide sequences from pdb nucleic acid structures. They are NOT the protein coding sequences for the corresponding pdbaa entries.
non-redundant protein sequence database with entries from [[GenPept]], Swissprot, PIR, PDF, PDB, and NCBI [[RefSeq]]


env_nr:
'''Protein Sequence Databases:'''
Enviromental samples
 
''nr:'' non-redundant protein sequence database with entries from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq
 
''env_nr:'' Protein enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples.
 
''swissprot:'' SwissProt sequence databases (last major update)
 
''pdbaa:'' protein sequences from pdb protein structures
 
== Specialized Databases ==
All specialized databases are subsets of the general nt, est, gss, htgs, and nr databases.
 
'''Algae specific databases contain sequences from the following taxonomic groups:'''
* ''Dinophyceae'' (dinoflagellates)
* ''Chlorarachniophyceae'' (chlorarachniophytes), eukaryotes
* ''Cryptophyta'' (cryptomonads), class, cryptomonads
* ''Euglenida'' (euglenids), phylum, euglenoids
* ''Glaucocystophyceae'' (glaucocystophytes), class, eukaryotes
* ''Haptophyceae'' (coccolithophorids), haptophytes
* ''Rhodophyta'' (red algae), red algae
* ''Bacillariophyta'' (diatoms), phylum, diatoms
* ''Chrysophyceae'' (golden algae), class, chrysophytes
* ''Dictyochophyceae'' (silicoflagellates), class, eukaryotes
* ''Eustigmatophyceae'' (eustigmatophytes), phylum, eukaryotes
* ''Phaeophyceae'' (brown algae), phylum, brown algae
* ''Phaeothamniophyceae'', class, eukaryotes
* ''Raphidophyceae'' (raphidophytes), class, eukaryotes
* ''Xanthophyceae'' (yellow-green algae), phylum, xanthophytes
* ''Chlorophyta'' (green algae), phylum, green algae
* ''Mesostigmatophyceae'', class, green plants
 
'''We offer the following algae specific sequence databases:'''
* nt_algae: algae specific subset of GenBank, EMBL,and DDBJ (nucleotide)
* est_algae: algae specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
* gss_algae: algae specific subset of GSS division  of GenBank, EMBL,and DDBJ (nucleotide)
* htgs_algae :algae specific subset of HTGS division  of GenBank, EMBL,and DDBJ (nucleotide)
* nr_algae: algae specific subset of NCBIs non redundant protein database
 
'''Fish specific databases contain sequences from the following taxonomic groups:'''
* ''Hyperotreti'', chordates
* ''Chondrichthyes'' (cartilaginous fishes) class, vertebrates
* ''Actinopterygii'' (ray-finned fishes) class, bony fishes
* ''Hyperoartia'', vertebrates
 
'''We offer the following fish sequence databases:'''
* nt_fishes: fishes specific subset of GenBank, EMBL,and DDBJ (nucleotide)
* est_fishes: fishes specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
* gss_fishes: fishes specific subset of GSS division  of GenBank, EMBL,and DDBJ (nucleotide)
* htgs_fishes :fishes specific subset of HTGS division  of GenBank, EMBL,and DDBJ (nucleotide)
* nr_fishes: fishes specific subset of NCBIs non redundant protein database
 
'''Databases Of Marine Organisms'''
 
The marine organism databases contain sequences from the following taxonomic groups. Be aware of changes and enhancements of this database in the near future. Suggestions for further taxonomic categories or more fine grained categories are welcome.
 
* ''Annelida'': True segmented worms capable of movement, with a large gut. The phylum includes the ragworms and lugworms familiar to anglers.
* ''Cetacea'': Whales + Dolphins
* ''Cnidaria'': Cnidaria, the major group of invertebrates that includes the sea anemones, corals, jellyfishes, hydroids, and animals that contain 'cnida' stinging capsules.
* ''Crustacea'' (Crabs): Aquatic gill-breathing Arthropods
* ''Echinodermata'': Starfishes, Sea Urchins, Sea Cucumbers and Related Invertebrates. Marine animals that are radially symmetrical (most species) and contain a unique water vascular system, and tube feet that are used for movement, respiration, protection (spines) and assist in the capture of food. The ''Echinodermata'' are exclusively marine, and most species are intolerant of immersion in low salinity water. One remarkable observation is that they are rarely settled on by barnacles, mussels and other fouling organisms.
* Fishes
* Algae
* ''Bryozoa'' (''Ectoprocta''): ''Bryozoa'' are aquatic colonial animals, which are abundant in modern marine environments, and have been important components of the fossil record. In places, the skeletal remains are so abundant that the fossils become an important rock-forming material. If you need a common name, then you can call them 'sea mats', 'moss animals' or 'lace corals' for some forms. The majority are marine, although brackish-water and freshwater forms are moderately common.
* ''Platyhelminthes'' (flatworms)
* ''Mollusca'': Soft bodied animals with a hard external shell (mussels, winkles, snails), or an internal shell (sea hares, cuttlefish) or have lost their shell in the course of evolution (nudibranchs). Molluscs have a mantle that secretes the calcium carbonate that makes up the shell. They inhabitat numerous different environments with a large number living in the sea.
* ''Porifera(sponges)''
* ''Tunicata (Urochordata)'': tunicates or sea squirts, are more closely related to humans than any other invertebrate group. This is because larval tunicates have several chordate structures - including a nerve chord and a notochord.
* ''Cephalochordata'' (Lancelets): With about twenty-five species inhabiting shallow tropical and temperate oceans, the ''Cephalochordata'' are a very small branch of the animal kingdom. Known as lancelets or as amphioxus (from the Greek for "both [ends] pointed," in reference to their shape), cephalochordates are small, eel-like, unprepossessing animals that spend much of their time buried in sand. However, because of their remarkable morphology, they have proved crucial in understanding the morphology and evolution of chordates in general -- including vertebrates.
 
'''We offer the following marine organism specific sequence databases:'''
* nt_marine: marine organism specific subset of GenBank, EMBL,and DDBJ (nucleotide)
* est_marine: marine organism specificSubset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
* gss_marine: marine organism specific subset of GSS division  of GenBank, EMBL,and DDBJ (nucleotide)
* htgs_marine: marine organism specific subset of HTGS division  of GenBank, EMBL,and DDBJ (nucleotide)
* nr_marine: marine organism specific subset of NCBIs non redundant protein database

Latest revision as of 07:17, 26 October 2011

Information about Sequence Databases

The Center of Biotechnology hosts a wide range of public sequence ressources plus several specialized in-house databases. If you think something is missing or you have suggestions for improvements or special requirements, do not hesitate to contact [1].

General Sequence Databases

Nucleotide Sequence Databases:

nt: nucleotide sequence database, with entries from all traditional divisions of GenBank, EMBL, and DDBJ excluding bulk divisions (gss (genome survey sequences), sts (sequence tagged sites), pat (patent), est (expressed sequence tags), and htgs (high throughput genome sequences) divisions). wgs (whole genome shotgun) entries are also excluded. Not non-redundant.

est: EST division of GenBank, EMBL,and DDBJ

gss: GSS division of GenBank, EMBL,and DDBJ

htgs: HTG division of GenBank, EMBL,and DDBJ

env_nt: Nucleotide enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples (whole genome shotgun sequences).

other_genomic: RefSeq chromosome records for organisms other than human

pdbnt: nucleotide sequences from pdb nucleic acid structures. They are NOT the protein coding sequences for the corresponding pdbaa entries.

Protein Sequence Databases:

nr: non-redundant protein sequence database with entries from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq

env_nr: Protein enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples.

swissprot: SwissProt sequence databases (last major update)

pdbaa: protein sequences from pdb protein structures

Specialized Databases

All specialized databases are subsets of the general nt, est, gss, htgs, and nr databases.

Algae specific databases contain sequences from the following taxonomic groups:

  • Dinophyceae (dinoflagellates)
  • Chlorarachniophyceae (chlorarachniophytes), eukaryotes
  • Cryptophyta (cryptomonads), class, cryptomonads
  • Euglenida (euglenids), phylum, euglenoids
  • Glaucocystophyceae (glaucocystophytes), class, eukaryotes
  • Haptophyceae (coccolithophorids), haptophytes
  • Rhodophyta (red algae), red algae
  • Bacillariophyta (diatoms), phylum, diatoms
  • Chrysophyceae (golden algae), class, chrysophytes
  • Dictyochophyceae (silicoflagellates), class, eukaryotes
  • Eustigmatophyceae (eustigmatophytes), phylum, eukaryotes
  • Phaeophyceae (brown algae), phylum, brown algae
  • Phaeothamniophyceae, class, eukaryotes
  • Raphidophyceae (raphidophytes), class, eukaryotes
  • Xanthophyceae (yellow-green algae), phylum, xanthophytes
  • Chlorophyta (green algae), phylum, green algae
  • Mesostigmatophyceae, class, green plants

We offer the following algae specific sequence databases:

  • nt_algae: algae specific subset of GenBank, EMBL,and DDBJ (nucleotide)
  • est_algae: algae specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
  • gss_algae: algae specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide)
  • htgs_algae :algae specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide)
  • nr_algae: algae specific subset of NCBIs non redundant protein database

Fish specific databases contain sequences from the following taxonomic groups:

  • Hyperotreti, chordates
  • Chondrichthyes (cartilaginous fishes) class, vertebrates
  • Actinopterygii (ray-finned fishes) class, bony fishes
  • Hyperoartia, vertebrates

We offer the following fish sequence databases:

  • nt_fishes: fishes specific subset of GenBank, EMBL,and DDBJ (nucleotide)
  • est_fishes: fishes specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
  • gss_fishes: fishes specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide)
  • htgs_fishes :fishes specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide)
  • nr_fishes: fishes specific subset of NCBIs non redundant protein database

Databases Of Marine Organisms

The marine organism databases contain sequences from the following taxonomic groups. Be aware of changes and enhancements of this database in the near future. Suggestions for further taxonomic categories or more fine grained categories are welcome.

  • Annelida: True segmented worms capable of movement, with a large gut. The phylum includes the ragworms and lugworms familiar to anglers.
  • Cetacea: Whales + Dolphins
  • Cnidaria: Cnidaria, the major group of invertebrates that includes the sea anemones, corals, jellyfishes, hydroids, and animals that contain 'cnida' stinging capsules.
  • Crustacea (Crabs): Aquatic gill-breathing Arthropods
  • Echinodermata: Starfishes, Sea Urchins, Sea Cucumbers and Related Invertebrates. Marine animals that are radially symmetrical (most species) and contain a unique water vascular system, and tube feet that are used for movement, respiration, protection (spines) and assist in the capture of food. The Echinodermata are exclusively marine, and most species are intolerant of immersion in low salinity water. One remarkable observation is that they are rarely settled on by barnacles, mussels and other fouling organisms.
  • Fishes
  • Algae
  • Bryozoa (Ectoprocta): Bryozoa are aquatic colonial animals, which are abundant in modern marine environments, and have been important components of the fossil record. In places, the skeletal remains are so abundant that the fossils become an important rock-forming material. If you need a common name, then you can call them 'sea mats', 'moss animals' or 'lace corals' for some forms. The majority are marine, although brackish-water and freshwater forms are moderately common.
  • Platyhelminthes (flatworms)
  • Mollusca: Soft bodied animals with a hard external shell (mussels, winkles, snails), or an internal shell (sea hares, cuttlefish) or have lost their shell in the course of evolution (nudibranchs). Molluscs have a mantle that secretes the calcium carbonate that makes up the shell. They inhabitat numerous different environments with a large number living in the sea.
  • Porifera(sponges)
  • Tunicata (Urochordata): tunicates or sea squirts, are more closely related to humans than any other invertebrate group. This is because larval tunicates have several chordate structures - including a nerve chord and a notochord.
  • Cephalochordata (Lancelets): With about twenty-five species inhabiting shallow tropical and temperate oceans, the Cephalochordata are a very small branch of the animal kingdom. Known as lancelets or as amphioxus (from the Greek for "both [ends] pointed," in reference to their shape), cephalochordates are small, eel-like, unprepossessing animals that spend much of their time buried in sand. However, because of their remarkable morphology, they have proved crucial in understanding the morphology and evolution of chordates in general -- including vertebrates.

We offer the following marine organism specific sequence databases:

  • nt_marine: marine organism specific subset of GenBank, EMBL,and DDBJ (nucleotide)
  • est_marine: marine organism specificSubset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
  • gss_marine: marine organism specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide)
  • htgs_marine: marine organism specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide)
  • nr_marine: marine organism specific subset of NCBIs non redundant protein database