MGE-PortalWiki/BLAST: Difference between revisions

From BRF-Software
Jump to navigation Jump to search
imported>MichaelBeckstette
No edit summary
m (50 revisions)
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__NOTOC__
__NOTOC__
= Informations about Sequence Databases =
= Information about Sequence Databases =


== General databases ==
The Center of Biotechnology hosts a wide range of public sequence ressources plus several specialized in-house databases. If you think something is missing or you have suggestions for improvements or special requirements, do not hesitate to contact [mailto:mg-bielefeld@cebitec.uni-bielefeld.de].


'''Nucleotide sequence databases:'''
== General Sequence Databases ==


''nt:'' nucleotide sequence database, with entries from all traditional divisions of GenBank, EMBL, and DDBJ excluding bulk divisions (gss, sts, pat, est, and htg divisions. wgs entries are also excluded. Not non-redundant.
'''Nucleotide Sequence Databases:'''
 
''nt:'' nucleotide sequence database, with entries from all traditional divisions of GenBank, EMBL, and DDBJ excluding bulk divisions (gss (genome survey sequences), sts (sequence tagged sites), pat (patent), est (expressed sequence tags), and htgs (high throughput genome sequences) divisions). wgs (whole genome shotgun) entries are also excluded. Not non-redundant.


''est:'' EST division of GenBank, EMBL,and DDBJ
''est:'' EST division of GenBank, EMBL,and DDBJ
Line 14: Line 16:
''htgs:'' HTG division of GenBank, EMBL,and DDBJ
''htgs:'' HTG division of GenBank, EMBL,and DDBJ


''env_nt:'' Enviromental samples database
''env_nt:'' Nucleotide enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples (whole genome shotgun sequences).
 
''other_genomic:'' RefSeq chromosome records for organisms other than human
 
''pdbnt:'' nucleotide sequences from pdb nucleic acid structures. They are NOT the protein coding sequences for the corresponding pdbaa entries.
 
'''Protein Sequence Databases:'''


'''Protein sequence databases:'''
''nr:'' non-redundant protein sequence database with entries from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq


''nr:'' non-redundant protein sequence database with entries from [[GenPept]], Swissprot, PIR, PDF, PDB, and NCBI [[RefSeq]]
''env_nr:'' Protein enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples.


''env_nr:'' enviromental samples database
''swissprot:'' SwissProt sequence databases (last major update)
 
''pdbaa:'' protein sequences from pdb protein structures


== Specialized Databases ==
== Specialized Databases ==
All specialized databases are subset of the general nt,est,gss,htgs, and nr databases.
All specialized databases are subsets of the general nt, est, gss, htgs, and nr databases.


'''Algae specific databases contain sequences from the following taxonomic groups:'''
'''Algae specific databases contain sequences from the following taxonomic groups:'''
* Dinophyceae (dinoflagellates)
* ''Dinophyceae'' (dinoflagellates)
* Chlorarachniophyceae (chlorarachniophytes), eukaryotes
* ''Chlorarachniophyceae'' (chlorarachniophytes), eukaryotes
* Cryptophyta (cryptomonads), class, cryptomonads
* ''Cryptophyta'' (cryptomonads), class, cryptomonads
* Euglenida (euglenids), phylum, euglenoids
* ''Euglenida'' (euglenids), phylum, euglenoids
* Glaucocystophyceae (glaucocystophytes), class, eukaryotes
* ''Glaucocystophyceae'' (glaucocystophytes), class, eukaryotes
* Haptophyceae (coccolithophorids), haptophytes
* ''Haptophyceae'' (coccolithophorids), haptophytes
* Rhodophyta (red algae), red algae
* ''Rhodophyta'' (red algae), red algae
* Bacillariophyta (diatoms), phylum, diatoms
* ''Bacillariophyta'' (diatoms), phylum, diatoms
* Chrysophyceae (golden algae), class, chrysophytes
* ''Chrysophyceae'' (golden algae), class, chrysophytes
* Dictyochophyceae (silicoflagellates), class, eukaryotes
* ''Dictyochophyceae'' (silicoflagellates), class, eukaryotes
* Eustigmatophyceae (eustigmatophytes), phylum, eukaryotes
* ''Eustigmatophyceae'' (eustigmatophytes), phylum, eukaryotes
* Phaeophyceae (brown algae), phylum, brown algae
* ''Phaeophyceae'' (brown algae), phylum, brown algae
* Phaeothamniophyceae, class, eukaryotes
* ''Phaeothamniophyceae'', class, eukaryotes
* Raphidophyceae (raphidophytes), class, eukaryotes
* ''Raphidophyceae'' (raphidophytes), class, eukaryotes
* Xanthophyceae (yellow-green algae), phylum, xanthophytes
* ''Xanthophyceae'' (yellow-green algae), phylum, xanthophytes
* Chlorophyta (green algae), phylum, green algae
* ''Chlorophyta'' (green algae), phylum, green algae
* Mesostigmatophyceae, class, green plants
* ''Mesostigmatophyceae'', class, green plants


'''We offer the following algae specific sequence databases:'''
'''We offer the following algae specific sequence databases:'''
* nt_algae: algae specific subset of GenBank, EMBL,and DDBJ (nucleotide)
* nt_algae: algae specific subset of GenBank, EMBL,and DDBJ (nucleotide)
* est_algae: algae specificSubset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
* est_algae: algae specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
* gss_algae: algae specific subset of GSS division  of GenBank, EMBL,and DDBJ (nucleotide)
* gss_algae: algae specific subset of GSS division  of GenBank, EMBL,and DDBJ (nucleotide)
* htgs_algae :algae specific subset of HTGS division  of GenBank, EMBL,and DDBJ (nucleotide)
* htgs_algae :algae specific subset of HTGS division  of GenBank, EMBL,and DDBJ (nucleotide)
Line 52: Line 62:


'''Fish specific databases contain sequences from the following taxonomic groups:'''
'''Fish specific databases contain sequences from the following taxonomic groups:'''
* Hyperotreti, chordates
* ''Hyperotreti'', chordates
* Chondrichthyes (cartilaginous fishes) class, vertebrates
* ''Chondrichthyes'' (cartilaginous fishes) class, vertebrates
* Actinopterygii (ray-finned fishes) class, bony fishes
* ''Actinopterygii'' (ray-finned fishes) class, bony fishes
* Hyperoartia, vertebrates
* ''Hyperoartia'', vertebrates


'''We offer the following fish sequence databases:'''
'''We offer the following fish sequence databases:'''
* nt_fishes: fishes specific subset of GenBank, EMBL,and DDBJ (nucleotide)
* nt_fishes: fishes specific subset of GenBank, EMBL,and DDBJ (nucleotide)
* est_fishes: fishes specificSubset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
* est_fishes: fishes specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
* gss_fishes: fishes specific subset of GSS division  of GenBank, EMBL,and DDBJ (nucleotide)
* gss_fishes: fishes specific subset of GSS division  of GenBank, EMBL,and DDBJ (nucleotide)
* htgs_fishes :fishes specific subset of HTGS division  of GenBank, EMBL,and DDBJ (nucleotide)
* htgs_fishes :fishes specific subset of HTGS division  of GenBank, EMBL,and DDBJ (nucleotide)
* nr_fishes: fishes specific subset of NCBIs non redundant protein database
* nr_fishes: fishes specific subset of NCBIs non redundant protein database


'''Databases of marine organisms'''
'''Databases Of Marine Organisms'''


The marine organism databases contain sequences from the following taxonomic groups. Be aware of changes and enhancements of this database in the near future. Suggestions for further taxonomic categories or more fine grained categories are welcome.
The marine organism databases contain sequences from the following taxonomic groups. Be aware of changes and enhancements of this database in the near future. Suggestions for further taxonomic categories or more fine grained categories are welcome.


* Annelida:True segmented worms capable of movement, with a large gut. The phylum includes the ragworms and lugworms familiar to anglers.
* ''Annelida'': True segmented worms capable of movement, with a large gut. The phylum includes the ragworms and lugworms familiar to anglers.
* Cetacea:Whales + Dolphins
* ''Cetacea'': Whales + Dolphins
* Cnidaria:: Cnidaria, the major group of invertebrates that includes the sea anemones, corals, jellyfishes, hydroids, and animals that contain 'cnida' stinging capsules.
* ''Cnidaria'': Cnidaria, the major group of invertebrates that includes the sea anemones, corals, jellyfishes, hydroids, and animals that contain 'cnida' stinging capsules.
* Crustacea (Crabs): Aquatic gill-breathing Arthropods
* ''Crustacea'' (Crabs): Aquatic gill-breathing Arthropods
* Echinodermata: Starfishes, Sea Urchins, Sea Cucumbers and Related Invertebrates
* ''Echinodermata'': Starfishes, Sea Urchins, Sea Cucumbers and Related Invertebrates. Marine animals that are radially symmetrical (most species) and contain a unique water vascular system, and tube feet that are used for movement, respiration, protection (spines) and assist in the capture of food. The ''Echinodermata'' are exclusively marine, and most species are intolerant of immersion in low salinity water. One remarkable observation is that they are rarely settled on by barnacles, mussels and other fouling organisms.
Marine animals that are radially symmetrical (most species) and contain a unique water vascular system, and tube feet that are used for movement, respiration, protection (spines) and assist in the capture of food.
The Echinodermata are exclusively marine, and most species are intolerant of immersion in low salinity water. One remarkable observation is that they are rarely settled on by barnacles, mussels and other fouling organisms.
* Fishes
* Fishes
* Algae
* Algae
* Bryozoa (Ectoprocta): Bryozoa are aquatic colonial animals, which are abundant in modern marine environments, and have been important components of the fossil record. In places, the skeletal remains are so abundant that the fossils become an important rock-forming material. If you need a common name, then you can call them 'sea mats', 'moss animals' or 'lace corals' for some forms. The majority are marine, although brackish-water and freshwater forms are moderately common.
* ''Bryozoa'' (''Ectoprocta''): ''Bryozoa'' are aquatic colonial animals, which are abundant in modern marine environments, and have been important components of the fossil record. In places, the skeletal remains are so abundant that the fossils become an important rock-forming material. If you need a common name, then you can call them 'sea mats', 'moss animals' or 'lace corals' for some forms. The majority are marine, although brackish-water and freshwater forms are moderately common.
* Platyhelminthes (flatworms)
* ''Platyhelminthes'' (flatworms)
* Mollusca: Soft bodied animals with a hard external shell (mussels, winkles, snails), or an internal shell (sea hares, cuttlefish) or have lost their shell in the course of evolution (nudibranchs). Molluscs have a mantle that secretes the calcium carbonate that makes up the shell. They inhabitat numerous different environments with a large number living in the sea.
* ''Mollusca'': Soft bodied animals with a hard external shell (mussels, winkles, snails), or an internal shell (sea hares, cuttlefish) or have lost their shell in the course of evolution (nudibranchs). Molluscs have a mantle that secretes the calcium carbonate that makes up the shell. They inhabitat numerous different environments with a large number living in the sea.
* Porifera(sponges)
* ''Porifera(sponges)''
* Tunicata (Urochordata): tunicates or sea squirts, are more closely related to humans than any other invertebrate group. This is because larval tunicates have several chordate structures - including a nerve chord and a notochord.
* ''Tunicata (Urochordata)'': tunicates or sea squirts, are more closely related to humans than any other invertebrate group. This is because larval tunicates have several chordate structures - including a nerve chord and a notochord.
* CephalochOrdata (Lancelets): With about twenty-five species inhabiting shallow tropical and temperate oceans, the Cephalochordata are a very small branch of the animal kingdom. Known as lancelets or as amphioxus (from the Greek for "both [ends] pointed," in reference to their shape), cephalochordates are small, eel-like, unprepossessing animals that spend much of their time buried in sand. However, because of their remarkable morphology, they have proved crucial in understanding the morphology and evolution of chordates in general -- including vertebrates.
* ''Cephalochordata'' (Lancelets): With about twenty-five species inhabiting shallow tropical and temperate oceans, the ''Cephalochordata'' are a very small branch of the animal kingdom. Known as lancelets or as amphioxus (from the Greek for "both [ends] pointed," in reference to their shape), cephalochordates are small, eel-like, unprepossessing animals that spend much of their time buried in sand. However, because of their remarkable morphology, they have proved crucial in understanding the morphology and evolution of chordates in general -- including vertebrates.
 
'''We offer the following marine organism specific sequence databases:'''
* nt_marine: marine organism specific subset of GenBank, EMBL,and DDBJ (nucleotide)
* est_marine: marine organism specificSubset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
* gss_marine: marine organism specific subset of GSS division  of GenBank, EMBL,and DDBJ (nucleotide)
* htgs_marine: marine organism specific subset of HTGS division  of GenBank, EMBL,and DDBJ (nucleotide)
* nr_marine: marine organism specific subset of NCBIs non redundant protein database

Latest revision as of 07:17, 26 October 2011

Information about Sequence Databases

The Center of Biotechnology hosts a wide range of public sequence ressources plus several specialized in-house databases. If you think something is missing or you have suggestions for improvements or special requirements, do not hesitate to contact [1].

General Sequence Databases

Nucleotide Sequence Databases:

nt: nucleotide sequence database, with entries from all traditional divisions of GenBank, EMBL, and DDBJ excluding bulk divisions (gss (genome survey sequences), sts (sequence tagged sites), pat (patent), est (expressed sequence tags), and htgs (high throughput genome sequences) divisions). wgs (whole genome shotgun) entries are also excluded. Not non-redundant.

est: EST division of GenBank, EMBL,and DDBJ

gss: GSS division of GenBank, EMBL,and DDBJ

htgs: HTG division of GenBank, EMBL,and DDBJ

env_nt: Nucleotide enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples (whole genome shotgun sequences).

other_genomic: RefSeq chromosome records for organisms other than human

pdbnt: nucleotide sequences from pdb nucleic acid structures. They are NOT the protein coding sequences for the corresponding pdbaa entries.

Protein Sequence Databases:

nr: non-redundant protein sequence database with entries from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq

env_nr: Protein enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples.

swissprot: SwissProt sequence databases (last major update)

pdbaa: protein sequences from pdb protein structures

Specialized Databases

All specialized databases are subsets of the general nt, est, gss, htgs, and nr databases.

Algae specific databases contain sequences from the following taxonomic groups:

  • Dinophyceae (dinoflagellates)
  • Chlorarachniophyceae (chlorarachniophytes), eukaryotes
  • Cryptophyta (cryptomonads), class, cryptomonads
  • Euglenida (euglenids), phylum, euglenoids
  • Glaucocystophyceae (glaucocystophytes), class, eukaryotes
  • Haptophyceae (coccolithophorids), haptophytes
  • Rhodophyta (red algae), red algae
  • Bacillariophyta (diatoms), phylum, diatoms
  • Chrysophyceae (golden algae), class, chrysophytes
  • Dictyochophyceae (silicoflagellates), class, eukaryotes
  • Eustigmatophyceae (eustigmatophytes), phylum, eukaryotes
  • Phaeophyceae (brown algae), phylum, brown algae
  • Phaeothamniophyceae, class, eukaryotes
  • Raphidophyceae (raphidophytes), class, eukaryotes
  • Xanthophyceae (yellow-green algae), phylum, xanthophytes
  • Chlorophyta (green algae), phylum, green algae
  • Mesostigmatophyceae, class, green plants

We offer the following algae specific sequence databases:

  • nt_algae: algae specific subset of GenBank, EMBL,and DDBJ (nucleotide)
  • est_algae: algae specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
  • gss_algae: algae specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide)
  • htgs_algae :algae specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide)
  • nr_algae: algae specific subset of NCBIs non redundant protein database

Fish specific databases contain sequences from the following taxonomic groups:

  • Hyperotreti, chordates
  • Chondrichthyes (cartilaginous fishes) class, vertebrates
  • Actinopterygii (ray-finned fishes) class, bony fishes
  • Hyperoartia, vertebrates

We offer the following fish sequence databases:

  • nt_fishes: fishes specific subset of GenBank, EMBL,and DDBJ (nucleotide)
  • est_fishes: fishes specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
  • gss_fishes: fishes specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide)
  • htgs_fishes :fishes specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide)
  • nr_fishes: fishes specific subset of NCBIs non redundant protein database

Databases Of Marine Organisms

The marine organism databases contain sequences from the following taxonomic groups. Be aware of changes and enhancements of this database in the near future. Suggestions for further taxonomic categories or more fine grained categories are welcome.

  • Annelida: True segmented worms capable of movement, with a large gut. The phylum includes the ragworms and lugworms familiar to anglers.
  • Cetacea: Whales + Dolphins
  • Cnidaria: Cnidaria, the major group of invertebrates that includes the sea anemones, corals, jellyfishes, hydroids, and animals that contain 'cnida' stinging capsules.
  • Crustacea (Crabs): Aquatic gill-breathing Arthropods
  • Echinodermata: Starfishes, Sea Urchins, Sea Cucumbers and Related Invertebrates. Marine animals that are radially symmetrical (most species) and contain a unique water vascular system, and tube feet that are used for movement, respiration, protection (spines) and assist in the capture of food. The Echinodermata are exclusively marine, and most species are intolerant of immersion in low salinity water. One remarkable observation is that they are rarely settled on by barnacles, mussels and other fouling organisms.
  • Fishes
  • Algae
  • Bryozoa (Ectoprocta): Bryozoa are aquatic colonial animals, which are abundant in modern marine environments, and have been important components of the fossil record. In places, the skeletal remains are so abundant that the fossils become an important rock-forming material. If you need a common name, then you can call them 'sea mats', 'moss animals' or 'lace corals' for some forms. The majority are marine, although brackish-water and freshwater forms are moderately common.
  • Platyhelminthes (flatworms)
  • Mollusca: Soft bodied animals with a hard external shell (mussels, winkles, snails), or an internal shell (sea hares, cuttlefish) or have lost their shell in the course of evolution (nudibranchs). Molluscs have a mantle that secretes the calcium carbonate that makes up the shell. They inhabitat numerous different environments with a large number living in the sea.
  • Porifera(sponges)
  • Tunicata (Urochordata): tunicates or sea squirts, are more closely related to humans than any other invertebrate group. This is because larval tunicates have several chordate structures - including a nerve chord and a notochord.
  • Cephalochordata (Lancelets): With about twenty-five species inhabiting shallow tropical and temperate oceans, the Cephalochordata are a very small branch of the animal kingdom. Known as lancelets or as amphioxus (from the Greek for "both [ends] pointed," in reference to their shape), cephalochordates are small, eel-like, unprepossessing animals that spend much of their time buried in sand. However, because of their remarkable morphology, they have proved crucial in understanding the morphology and evolution of chordates in general -- including vertebrates.

We offer the following marine organism specific sequence databases:

  • nt_marine: marine organism specific subset of GenBank, EMBL,and DDBJ (nucleotide)
  • est_marine: marine organism specificSubset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
  • gss_marine: marine organism specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide)
  • htgs_marine: marine organism specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide)
  • nr_marine: marine organism specific subset of NCBIs non redundant protein database