MGE-PortalWiki/BLAST: Difference between revisions
No edit summary |
m (50 revisions) |
||
(49 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
__NOTOC__ | __NOTOC__ | ||
= Information about Sequence Databases | = Information about Sequence Databases = | ||
The Center of Biotechnology hosts a wide range of public sequence ressources plus several specialized in-house databases. If you think something is missing or you have suggestions for improvements or special requirements, do not hesitate to contact [mailto:mg-bielefeld@cebitec.uni-bielefeld.de]. | |||
== General Sequence Databases == | |||
'''Nucleotide Sequence Databases:''' | |||
''nt:'' nucleotide sequence database, with entries from all traditional divisions of GenBank, EMBL, and DDBJ excluding bulk divisions (gss (genome survey sequences), sts (sequence tagged sites), pat (patent), est (expressed sequence tags), and htgs (high throughput genome sequences) divisions). wgs (whole genome shotgun) entries are also excluded. Not non-redundant. | |||
''est:'' EST division of GenBank, EMBL,and DDBJ | |||
''gss:'' GSS division of GenBank, EMBL,and DDBJ | |||
''htgs:'' HTG division of GenBank, EMBL,and DDBJ | |||
''env_nt:'' Nucleotide enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples (whole genome shotgun sequences). | |||
''other_genomic:'' RefSeq chromosome records for organisms other than human | |||
''pdbnt:'' nucleotide sequences from pdb nucleic acid structures. They are NOT the protein coding sequences for the corresponding pdbaa entries. | |||
env_nr: | '''Protein Sequence Databases:''' | ||
''nr:'' non-redundant protein sequence database with entries from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq | |||
''env_nr:'' Protein enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples. | |||
''swissprot:'' SwissProt sequence databases (last major update) | |||
''pdbaa:'' protein sequences from pdb protein structures | |||
== Specialized Databases == | |||
All specialized databases are subsets of the general nt, est, gss, htgs, and nr databases. | |||
'''Algae specific databases contain sequences from the following taxonomic groups:''' | |||
* ''Dinophyceae'' (dinoflagellates) | |||
* ''Chlorarachniophyceae'' (chlorarachniophytes), eukaryotes | |||
* ''Cryptophyta'' (cryptomonads), class, cryptomonads | |||
* ''Euglenida'' (euglenids), phylum, euglenoids | |||
* ''Glaucocystophyceae'' (glaucocystophytes), class, eukaryotes | |||
* ''Haptophyceae'' (coccolithophorids), haptophytes | |||
* ''Rhodophyta'' (red algae), red algae | |||
* ''Bacillariophyta'' (diatoms), phylum, diatoms | |||
* ''Chrysophyceae'' (golden algae), class, chrysophytes | |||
* ''Dictyochophyceae'' (silicoflagellates), class, eukaryotes | |||
* ''Eustigmatophyceae'' (eustigmatophytes), phylum, eukaryotes | |||
* ''Phaeophyceae'' (brown algae), phylum, brown algae | |||
* ''Phaeothamniophyceae'', class, eukaryotes | |||
* ''Raphidophyceae'' (raphidophytes), class, eukaryotes | |||
* ''Xanthophyceae'' (yellow-green algae), phylum, xanthophytes | |||
* ''Chlorophyta'' (green algae), phylum, green algae | |||
* ''Mesostigmatophyceae'', class, green plants | |||
'''We offer the following algae specific sequence databases:''' | |||
* nt_algae: algae specific subset of GenBank, EMBL,and DDBJ (nucleotide) | |||
* est_algae: algae specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide) | |||
* gss_algae: algae specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide) | |||
* htgs_algae :algae specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide) | |||
* nr_algae: algae specific subset of NCBIs non redundant protein database | |||
'''Fish specific databases contain sequences from the following taxonomic groups:''' | |||
* ''Hyperotreti'', chordates | |||
* ''Chondrichthyes'' (cartilaginous fishes) class, vertebrates | |||
* ''Actinopterygii'' (ray-finned fishes) class, bony fishes | |||
* ''Hyperoartia'', vertebrates | |||
'''We offer the following fish sequence databases:''' | |||
* nt_fishes: fishes specific subset of GenBank, EMBL,and DDBJ (nucleotide) | |||
* est_fishes: fishes specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide) | |||
* gss_fishes: fishes specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide) | |||
* htgs_fishes :fishes specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide) | |||
* nr_fishes: fishes specific subset of NCBIs non redundant protein database | |||
'''Databases Of Marine Organisms''' | |||
The marine organism databases contain sequences from the following taxonomic groups. Be aware of changes and enhancements of this database in the near future. Suggestions for further taxonomic categories or more fine grained categories are welcome. | |||
* ''Annelida'': True segmented worms capable of movement, with a large gut. The phylum includes the ragworms and lugworms familiar to anglers. | |||
* ''Cetacea'': Whales + Dolphins | |||
* ''Cnidaria'': Cnidaria, the major group of invertebrates that includes the sea anemones, corals, jellyfishes, hydroids, and animals that contain 'cnida' stinging capsules. | |||
* ''Crustacea'' (Crabs): Aquatic gill-breathing Arthropods | |||
* ''Echinodermata'': Starfishes, Sea Urchins, Sea Cucumbers and Related Invertebrates. Marine animals that are radially symmetrical (most species) and contain a unique water vascular system, and tube feet that are used for movement, respiration, protection (spines) and assist in the capture of food. The ''Echinodermata'' are exclusively marine, and most species are intolerant of immersion in low salinity water. One remarkable observation is that they are rarely settled on by barnacles, mussels and other fouling organisms. | |||
* Fishes | |||
* Algae | |||
* ''Bryozoa'' (''Ectoprocta''): ''Bryozoa'' are aquatic colonial animals, which are abundant in modern marine environments, and have been important components of the fossil record. In places, the skeletal remains are so abundant that the fossils become an important rock-forming material. If you need a common name, then you can call them 'sea mats', 'moss animals' or 'lace corals' for some forms. The majority are marine, although brackish-water and freshwater forms are moderately common. | |||
* ''Platyhelminthes'' (flatworms) | |||
* ''Mollusca'': Soft bodied animals with a hard external shell (mussels, winkles, snails), or an internal shell (sea hares, cuttlefish) or have lost their shell in the course of evolution (nudibranchs). Molluscs have a mantle that secretes the calcium carbonate that makes up the shell. They inhabitat numerous different environments with a large number living in the sea. | |||
* ''Porifera(sponges)'' | |||
* ''Tunicata (Urochordata)'': tunicates or sea squirts, are more closely related to humans than any other invertebrate group. This is because larval tunicates have several chordate structures - including a nerve chord and a notochord. | |||
* ''Cephalochordata'' (Lancelets): With about twenty-five species inhabiting shallow tropical and temperate oceans, the ''Cephalochordata'' are a very small branch of the animal kingdom. Known as lancelets or as amphioxus (from the Greek for "both [ends] pointed," in reference to their shape), cephalochordates are small, eel-like, unprepossessing animals that spend much of their time buried in sand. However, because of their remarkable morphology, they have proved crucial in understanding the morphology and evolution of chordates in general -- including vertebrates. | |||
'''We offer the following marine organism specific sequence databases:''' | |||
* nt_marine: marine organism specific subset of GenBank, EMBL,and DDBJ (nucleotide) | |||
* est_marine: marine organism specificSubset of EST division of GenBank, EMBL,and DDBJ (nucleotide) | |||
* gss_marine: marine organism specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide) | |||
* htgs_marine: marine organism specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide) | |||
* nr_marine: marine organism specific subset of NCBIs non redundant protein database |
Latest revision as of 07:17, 26 October 2011
Information about Sequence Databases
The Center of Biotechnology hosts a wide range of public sequence ressources plus several specialized in-house databases. If you think something is missing or you have suggestions for improvements or special requirements, do not hesitate to contact [1].
General Sequence Databases
Nucleotide Sequence Databases:
nt: nucleotide sequence database, with entries from all traditional divisions of GenBank, EMBL, and DDBJ excluding bulk divisions (gss (genome survey sequences), sts (sequence tagged sites), pat (patent), est (expressed sequence tags), and htgs (high throughput genome sequences) divisions). wgs (whole genome shotgun) entries are also excluded. Not non-redundant.
est: EST division of GenBank, EMBL,and DDBJ
gss: GSS division of GenBank, EMBL,and DDBJ
htgs: HTG division of GenBank, EMBL,and DDBJ
env_nt: Nucleotide enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples (whole genome shotgun sequences).
other_genomic: RefSeq chromosome records for organisms other than human
pdbnt: nucleotide sequences from pdb nucleic acid structures. They are NOT the protein coding sequences for the corresponding pdbaa entries.
Protein Sequence Databases:
nr: non-redundant protein sequence database with entries from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq
env_nr: Protein enviromental samples database. Contains Sargasso sea environmental samples as well as mine drainage environmental samples.
swissprot: SwissProt sequence databases (last major update)
pdbaa: protein sequences from pdb protein structures
Specialized Databases
All specialized databases are subsets of the general nt, est, gss, htgs, and nr databases.
Algae specific databases contain sequences from the following taxonomic groups:
- Dinophyceae (dinoflagellates)
- Chlorarachniophyceae (chlorarachniophytes), eukaryotes
- Cryptophyta (cryptomonads), class, cryptomonads
- Euglenida (euglenids), phylum, euglenoids
- Glaucocystophyceae (glaucocystophytes), class, eukaryotes
- Haptophyceae (coccolithophorids), haptophytes
- Rhodophyta (red algae), red algae
- Bacillariophyta (diatoms), phylum, diatoms
- Chrysophyceae (golden algae), class, chrysophytes
- Dictyochophyceae (silicoflagellates), class, eukaryotes
- Eustigmatophyceae (eustigmatophytes), phylum, eukaryotes
- Phaeophyceae (brown algae), phylum, brown algae
- Phaeothamniophyceae, class, eukaryotes
- Raphidophyceae (raphidophytes), class, eukaryotes
- Xanthophyceae (yellow-green algae), phylum, xanthophytes
- Chlorophyta (green algae), phylum, green algae
- Mesostigmatophyceae, class, green plants
We offer the following algae specific sequence databases:
- nt_algae: algae specific subset of GenBank, EMBL,and DDBJ (nucleotide)
- est_algae: algae specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
- gss_algae: algae specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide)
- htgs_algae :algae specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide)
- nr_algae: algae specific subset of NCBIs non redundant protein database
Fish specific databases contain sequences from the following taxonomic groups:
- Hyperotreti, chordates
- Chondrichthyes (cartilaginous fishes) class, vertebrates
- Actinopterygii (ray-finned fishes) class, bony fishes
- Hyperoartia, vertebrates
We offer the following fish sequence databases:
- nt_fishes: fishes specific subset of GenBank, EMBL,and DDBJ (nucleotide)
- est_fishes: fishes specific subset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
- gss_fishes: fishes specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide)
- htgs_fishes :fishes specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide)
- nr_fishes: fishes specific subset of NCBIs non redundant protein database
Databases Of Marine Organisms
The marine organism databases contain sequences from the following taxonomic groups. Be aware of changes and enhancements of this database in the near future. Suggestions for further taxonomic categories or more fine grained categories are welcome.
- Annelida: True segmented worms capable of movement, with a large gut. The phylum includes the ragworms and lugworms familiar to anglers.
- Cetacea: Whales + Dolphins
- Cnidaria: Cnidaria, the major group of invertebrates that includes the sea anemones, corals, jellyfishes, hydroids, and animals that contain 'cnida' stinging capsules.
- Crustacea (Crabs): Aquatic gill-breathing Arthropods
- Echinodermata: Starfishes, Sea Urchins, Sea Cucumbers and Related Invertebrates. Marine animals that are radially symmetrical (most species) and contain a unique water vascular system, and tube feet that are used for movement, respiration, protection (spines) and assist in the capture of food. The Echinodermata are exclusively marine, and most species are intolerant of immersion in low salinity water. One remarkable observation is that they are rarely settled on by barnacles, mussels and other fouling organisms.
- Fishes
- Algae
- Bryozoa (Ectoprocta): Bryozoa are aquatic colonial animals, which are abundant in modern marine environments, and have been important components of the fossil record. In places, the skeletal remains are so abundant that the fossils become an important rock-forming material. If you need a common name, then you can call them 'sea mats', 'moss animals' or 'lace corals' for some forms. The majority are marine, although brackish-water and freshwater forms are moderately common.
- Platyhelminthes (flatworms)
- Mollusca: Soft bodied animals with a hard external shell (mussels, winkles, snails), or an internal shell (sea hares, cuttlefish) or have lost their shell in the course of evolution (nudibranchs). Molluscs have a mantle that secretes the calcium carbonate that makes up the shell. They inhabitat numerous different environments with a large number living in the sea.
- Porifera(sponges)
- Tunicata (Urochordata): tunicates or sea squirts, are more closely related to humans than any other invertebrate group. This is because larval tunicates have several chordate structures - including a nerve chord and a notochord.
- Cephalochordata (Lancelets): With about twenty-five species inhabiting shallow tropical and temperate oceans, the Cephalochordata are a very small branch of the animal kingdom. Known as lancelets or as amphioxus (from the Greek for "both [ends] pointed," in reference to their shape), cephalochordates are small, eel-like, unprepossessing animals that spend much of their time buried in sand. However, because of their remarkable morphology, they have proved crucial in understanding the morphology and evolution of chordates in general -- including vertebrates.
We offer the following marine organism specific sequence databases:
- nt_marine: marine organism specific subset of GenBank, EMBL,and DDBJ (nucleotide)
- est_marine: marine organism specificSubset of EST division of GenBank, EMBL,and DDBJ (nucleotide)
- gss_marine: marine organism specific subset of GSS division of GenBank, EMBL,and DDBJ (nucleotide)
- htgs_marine: marine organism specific subset of HTGS division of GenBank, EMBL,and DDBJ (nucleotide)
- nr_marine: marine organism specific subset of NCBIs non redundant protein database