AbstractISMB2004

From BRF-Software
Jump to navigation Jump to search

Michael Dondrup, Alexander Goesmann

EMMA 2 - A MAGE-compliant system for storage and analysis of microarray data

Short abstract

EMMA is a web-based software system for management and analysis of transcriptomic data. EMMA 2 now supports the complete MAGE-ML format for data import and export. EMMA allows mapping of gene expression data onto proteome data or pathways and vice versa and provides extensible analysis and visualization Plug-Ins via the R-language.

Long abstract

Apart from efficient storage, state-of-the-art algorithms and visualizations interpretation of data from microarrays often requires mapping large datasets onto a variety of genomic data sources. Here we describe the EMMA 2 software which is a repository for array-related data as well as a tool for automated analysis using routine pipelines.

EMMA is capable of importing and exporting the contents of its repository via MAGE-ML. Interchanging data either with other software or between institutions is a crucial step during analysis or publication of results. Thus EMMA 2 supports the complete MAGE-ML standard and can be used to gather microarray data from a variety of sources. Since the complete MAGE object model is fairly complex, the standard user-interface simplifies access by hiding the details of the model, while a MAGE-editor provides access to every detail when needed. Measured data can be exported in a variety of formats including binary data files like NetCDF and HDF5. EMMA is written mainly in Perl while portions are in R and Java.

The repository is implemented using O2DBI, which is a object-oriented code generator. O2DBI significantly simplifies the creation of complex database applications. Additionally the repository provides role-based access control for each data-object.

EMMA 2 offers the ability to describe experiments in a MIAME compliant way. Instead of relying simply on free-text descriptions, workflows can be described using a graphical workflow editor which allows to create customized protocols and to translate them into MAGE.

EMMA provides an interface to the statistical package R for data analysis. All methods available through the R-language can be made available to EMMA. Predefined methods for normalization, testing for differentially expressed genes and clustering exist. These methods can be arranged in pipelines and are sent to a batch-queuing system to utilize CPU-clusters parallel computation.

EMMA can visualize measured data for both quality control and analysis. If provided with the scanned images several types of quality control images can be computed. 2D-scatter plots of normalized and non-normalized data can be created. Additionally 3D-scatter plots and 3D-visualizations of self-organizing maps are available via a Java-application or VRML-Plug-In.

EMMA 2 can communicate with other software like the genome annotation system GenDB, the proteomics software ProDB or the pathway viewer GoPARC via the BRIDGE system. This allows EMMA to map eg. genetic location or pathway information into several visualizations. On the other hand it provides other applications with expression data. Expression profiles can for instance be mapped onto genetic loci by GenDB.