EMMAWiki/TermsAndConcepts/ForUsers/Datasets: Difference between revisions

From BRF-Software
Jump to navigation Jump to search
imported>MichaelDondrup
No edit summary
m (10 revisions)
 
(5 intermediate revisions by 2 users not shown)
Line 2: Line 2:
= Terms and Concepts: Dataset =
= Terms and Concepts: Dataset =


Well, that is just what it says, a set of data, in the special case of EMMA a matrix of numbers. These data might be the output of an external image analysis software like [[ImaGene]], [[GenePix]] or a .cel file from the Affy software. These type of data sets are called ''raw data'', as they normally need further processing to be usefull. In the case of a spotted !cDNA microarray these datasets must as a minimum contain the foreground and background intensities for each channel. Location information on the exact spot position is also often required, for instance to draw a false colour image of the microarray with the location of the spots to click on.  
Well, that is just what it says, a set of data, in the special case of EMMA a matrix of numbers. These data might be the output of an external image analysis software like ImaGene, GenePix or a .cel file from the Affymetrix software. These type of data sets are called ''raw data'', as they normally need further processing to be useful. In the case of a spotted cDNA microarray these datasets must as a minimum contain the foreground and background intensities for each channel. Location information on the exact spot position is also often required, for instance to draw a false colour image of the microarray with the location of the spots to click on.  


While rows of a raw dataset correspond to individual spots on the array, columns correspond to different types of measurments like foreground background and location values. These are called ''[[QuantitationTypes]]''. You think of them as simply being column headers for the matrix.
While rows of a raw dataset correspond to individual spots on the array, columns correspond to different types of measurements like foreground background and location values. These are called ''QuantitationTypes''. You think of them as simply being column headers for the matrix.


Several analysis methods like for instance ''[[Normalization]]'' will take raw datasets andproduce new datasets by computation. These datasets are called ''Transformed Datasets''. In Transformed Datasets often rows do not correspond to individual spots any more but to genes. Of course  different ''[[QuantitationTypes]]'' also exist for Derived Datasets. Which [[QuantitationTypes]] are actually there depends on the chosen analysis method.
Several analysis methods like for instance ''Normalization'' will take raw datasets and produce new datasets by computation. These datasets are called ''Transformed Datasets''. In Transformed Datasets often rows do not correspond to individual spots any more but to genes. Of course  different ''QuantitationTypes'' also exist for Derived Datasets. Which QuantitationTypes are actually there depends on the chosen analysis method.


Each Dataset has to be assigned to at least one Experiment. Raw Datasets can also belong to sevaral experiments. They are automatically added to an Experiment when the corresponding Arrays are assigned
Each Dataset has to be assigned to at least one Experiment. Raw Datasets can also belong to several experiments. They are automatically added to an Experiment when the corresponding Arrays are assigned.

Latest revision as of 07:15, 26 October 2011

Terms and Concepts: Dataset

Well, that is just what it says, a set of data, in the special case of EMMA a matrix of numbers. These data might be the output of an external image analysis software like ImaGene, GenePix or a .cel file from the Affymetrix software. These type of data sets are called raw data, as they normally need further processing to be useful. In the case of a spotted cDNA microarray these datasets must as a minimum contain the foreground and background intensities for each channel. Location information on the exact spot position is also often required, for instance to draw a false colour image of the microarray with the location of the spots to click on.

While rows of a raw dataset correspond to individual spots on the array, columns correspond to different types of measurements like foreground background and location values. These are called QuantitationTypes. You think of them as simply being column headers for the matrix.

Several analysis methods like for instance Normalization will take raw datasets and produce new datasets by computation. These datasets are called Transformed Datasets. In Transformed Datasets often rows do not correspond to individual spots any more but to genes. Of course different QuantitationTypes also exist for Derived Datasets. Which QuantitationTypes are actually there depends on the chosen analysis method.

Each Dataset has to be assigned to at least one Experiment. Raw Datasets can also belong to several experiments. They are automatically added to an Experiment when the corresponding Arrays are assigned.