ProDBWiki/DeveloperDocumentation/MzDataImportSpecification

From BRF-Software
Jump to navigation Jump to search

mzData Import Specification

Introduction

mzData is an XML exchange format for mass-spectrometry data and is created and maintained by the Proteomics Standards Initiative. The format is defined by an XML schema and an ontology. The ontology defines legal values for so-called cvParam elements to provide flexibility concerning new mass-spectrometers and their componentry.

An mzData format importer for ProDB needs to consider two different mapping problems. The first one is a relatively simple one: XML elements/attributes need to be mapped onto ProDB object attributes. The second kind of mapping is a little bit more difficult: we need to map ontology-controlled cvParam elements onto ProDB object attributes. An example:

ProDB models the kind of ionisation directly as classes, for example DB::Ionisation::Electrospray or DB::Ionisation::Maldi. In mzData, however, this information is stored in a list of cvParam elements in mzData/description/instrument/source, for example:

<mzData version="1.04" accessionNumber="psi-ms:12345" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <description>
                <admin>
                        ...
                </admin>
                <instrument>
                        <instrumentName>LCQ Deca XP</instrumentName>
                        <source>
                                <cvParam cvLabel="psi" accession="PSI:1000008" name="IonizationType" value="ESI"/>
                        </source>
                        <analyzerList count="1">
                                <analyzer>
                                        <cvParam cvLabel="psi" accession="PSI:1000010" name="AnalyzerType" value="PaulIonTrap"/>
                                        <cvParam cvLabel="psi" accession="PSI:1000011" name="Resolution" value="2000"/>
                                        <cvParam cvLabel="psi" accession="PSI:1000013" name="Accuracy" value="0.2"/>
                                </analyzer>
                        </analyzerList>
                        <detector>
                                <cvParam cvLabel="psi" accession="PSI:1000021" name="DetectorType" value="ElectronMultiplier"/>
                        </detector>
                </instrument>
                ...
        <description>
        ...
</mzData>

The importer needs to have access to the ontology and also some mapping information to see which parameter type triggers the generation of an object (e.g. name="IonizationType") and how other parameters are map into an object's attributes.

The ontology doesn't exist as an OWL file yet, only as tab-delimited list at [1]. As such, it cannot define parameter dependencies (e.g. if <cvParam name="AnalyzerType" ../> exists, there must be also a <cvParam name="Resolution" .../> element). Hopefully, the first proper release of the ontology as OWL version contains this kind of information. Otherwise we have to include this information in our mapping configuration(s).

Further reading