EMMAWiki/TermsAndConcepts/ForUsers/Datasets

From BRF-Software
Jump to navigation Jump to search

Terms and Concepts: Dataset

Well, that is just what it says, a set of data, in the special case of EMMA a matrix of numbers. These data might be the output of an external image analysis software like ImaGene, GenePix or a .cel file from the Affymetrix software. These type of data sets are called raw data, as they normally need further processing to be useful. In the case of a spotted cDNA microarray these datasets must as a minimum contain the foreground and background intensities for each channel. Location information on the exact spot position is also often required, for instance to draw a false colour image of the microarray with the location of the spots to click on.

While rows of a raw dataset correspond to individual spots on the array, columns correspond to different types of measurements like foreground background and location values. These are called QuantitationTypes. You think of them as simply being column headers for the matrix.

Several analysis methods like for instance Normalization will take raw datasets and produce new datasets by computation. These datasets are called Transformed Datasets. In Transformed Datasets often rows do not correspond to individual spots any more but to genes. Of course different QuantitationTypes also exist for Derived Datasets. Which QuantitationTypes are actually there depends on the chosen analysis method.

Each Dataset has to be assigned to at least one Experiment. Raw Datasets can also belong to several experiments. They are automatically added to an Experiment when the corresponding Arrays are assigned.