Revision as of 16:43, 1 December 2004

MetaXP Specification

Introduction

MetaXP is a search facility for expression data. It enables the user to quickly access experiments and their results by formulating queries such as "Give me all temperature-related stress experiments containing gene XY" or "Give me a list of all up-regulated genes in experiments on Xanthomonas campestris". MetaXP combines data from EMMA, GenDB and ProDB. The data is imported from each of these databases and processed for searchability.

Search Interface

Possible search results are lists of genes and experiments. Possible search parameters are (c.f. fig. 1):

the organism
experiment design and description
status of the regulation of genes (e.g. up or down regulated, same regulation level)
if there is a specific value for the expression level stated, the user should be able to search for a range of these values.

All these search parameter can set to a wildcard (i.e. have no influence on the search; default) and can be combined with the operations and and but not. Also the search system should be extendable to accommodate additional combinations of search parameters.

Implementation Details

The object-model will be created with the O2DBI system. The search facility will be created as an additional layer on top of these classes. In order to simplify the implementation of the graphical user interface (GUI) the search layer will have an introspection mechanism that provides the GUI with information on what can be searched for, what search parameter can entered and how these search parameters can be combined.

Schedule

The main aim is to have a running prototype early next year (Jan 2005).
A coarse list of milestones for such a prototype would be:
- Project setup (i.e. requesting GPMS project, setting up database).
- Implementation of an import prototype for EMMA (which will require refinements in the object-model). This includes an evaluation and definition of required information for an import.
- Implementation of a search layer and search user interface. The search layer will need several controlled vocabularies. These will be implemented in simple text files (CSV or RDF).
This prototype is then shown to Biologists to evaluate the usability of the user interface and to find missing features
The milestones of the main tool are then:
- Implementation of requests of the Biology experts
- Refinements of the EMMA import facility and implementation of a GenDB and ProDB import facility.
- Optimisation of the search layer and implementation of a database for the controlled vocabularies.

Requirements

To ensure that experiment descriptions stay comparable and searchable we need to introduce controlled vocabulary for experiment categories (e.g. GenDB, ProDB, EMMA) and experiment design types (temperature related stress, chemical stress, etc.).

Questions

Should a query such as "Give me a list of all up-regulated genes in experiments on Xanthomonas campestris" include the taxonomic range, that is should this request return results for "Xanthomonas", "Xanthomonas bromi", "Xanthomonas campestris pv. badrii", "Xanthomonas campestris (pv. campestris)", "Xanthomonas campestris pv. Carotae"? If yes, how should this be done? Simply by string comparison or by taxonomy?
How attributes such as up/down-regulated be applied to genes? Asking the user to apply these attributes sounds not feasible. Otherwise we could normalise the data, but, to make experiments comparable with each other, we need to normalise across the complete set of experiments. This needs to be done with each new import of experiments. This is computationally expensive and also requires that we store all the original data in parallel to the normalised data.
Who is allowed to import data into MetaXP? Only the experiment owner, or only the project leader, or both?
Also to clarify is where the rights import information into MetaXP should be defined. In MetaXP itself or in the exporting databases such as EMMA, GenDB and ProDB?
Andi pointed out that it might be useful to have a flag such as preliminary information. Preliminary information would flag data that is stored in MetaXP, but is not deemed good enough (yet) to be accessible by the search interface.
A gene is currently defined as an external reference to Region:CDS. As Andi stated IMHO correctly, it will not be only genes that we will have expression or regulation data on. Instead this should be a substance class that can be sub-classed for more specific definitions, for example a gene (but also a protein and later on a metabolite). Any opinions regarding this?

@@ Line 32: / Line 32: @@
 ** Refinements of the EMMA import facility and implementation of a GenDB and ProDB import facility.
 ** Optimisation of the search layer and implementation of a database for the controlled vocabularies.
+== Requirements ==
+To ensure that experiment descriptions stay comparable and searchable we need to introduce controlled vocabulary for experiment categories (e.g. GenDB, ProDB, EMMA) and experiment design types (temperature related stress, chemical stress, etc.).
+== Questions ==
+* Should a query such as "Give me a list of all up-regulated genes in experiments on Xanthomonas campestris" include the taxonomic range, that is should this request return results for "Xanthomonas", "Xanthomonas bromi", "Xanthomonas campestris pv. badrii", "Xanthomonas campestris (pv. campestris)", "Xanthomonas campestris pv. Carotae"? If yes, how should this be done? Simply by string comparison or by taxonomy?
+* How attributes such as up/down-regulated be applied to genes? Asking the user to apply these attributes sounds not feasible. Otherwise we could normalise the data, but, to make experiments comparable with each other, we need to normalise across the complete set of experiments. This needs to be done with each new import of experiments. This is computationally expensive and also requires that we store all the original data in parallel to the normalised data.
+* Who is allowed to import data into MetaXP? Only the experiment owner, or only the project leader, or both?
+* Also to clarify is where the rights import information into MetaXP should be defined. In MetaXP itself or in the exporting databases such as EMMA, GenDB and ProDB?
+* Andi pointed out that it might be useful to have a flag such as preliminary information. Preliminary information would flag data that is stored in MetaXP, but is not deemed good enough (yet) to be accessible by the search interface.
+* A gene is currently defined as an external reference to Region:CDS. As Andi stated IMHO correctly, it will not be only genes that we will have expression or regulation data on. Instead this should be a substance class that can be sub-classed for more specific definitions, for example a gene (but also a protein and later on a metabolite). Any opinions regarding this?

MetaXP/Specification: Difference between revisions