ProDBWiki/DeveloperDocumentation/Emowse

From BRF-Software
Jump to navigation Jump to search

Emowse - Protein Identification by mass spectrometry

Overview

For the unique and rapid identification of unknown sample proteins, a 'fingerprint' signature derived from peptide mass information is sufficient. By using the molecular weights of the peptides obtained by mass spectrometry, emowse searches a protein database for matches with the given data. The scoring algorithm tolerates experimental errors of a few Daltons. For each entry in the database to search, emowse derives whole sequence molecular weight and calculated peptide molecular weights for complete digests using a range of cleavage rules. If you like to look for your unknown sample using emowse, the peptide molecular weights of the protein are sufficient. In addition you can specify a variety of other parameters such as cleavage enzyme, whole sequence molecular weight and error tolerance.

How to use

If you like to search for your protein with emowse against a database, you have to select your Experiment and choose the link Mass Spectra Data on the left hand side of the web-page. There you have to select MS for scan type and choose your scans. Afterwards you get on to the search form by clicking on the Peptide Mass Fingerprint link. Now you can specify some details for your search. You can title the search if you like. You can choose any taxonomy but it will not affect the search with emowse. The selection of the searchengine emowse and the database(s) to search against are mandatory. If you know the whole sequence molecular weight of your protein you can declare the weight in the field Protein Mass. The unit is dalton. You can also select the enzyme or reagent used for the cleavage of the protein. If you are unsure which parameters you should specify in addition, do not do anything more. Otherwise you can specify the allowed whole seqence weight variability (Weight Variability), the error allowed for mass accuracy of experimental mass determination (Error Tolerance) and the weighting given to partially-cleaved peptide fragments. See more details in the ProDBWiki/WebDocumentation/PeptideMassFingerprintWithEmowse section. If you like to repeat a search with the same parameters, you have to choose the search again with old search parameters option at the bottom. All you have to do now is to click the Search button and eemowse will do the rest. When the search is finished, you will see a new window with the number of hits. To see the results, click on Browse Results.

Integration of emowse in ProDB

Most of my work is done in msanalysis.cgi, which can be found in prodb/share/www/cgi-bin. The most important function for the search with emowse is doMowseQuery. If this function is called, all values of the search form are collected. The databases are fetched from the mascot server via the Module MSTools in the same directory. This is done in the function getDatabases. For every selected database, every possible weight variability, every possible partials factor and every possible error tolerance in the given range, a SearchParameters object is created with the function get_search_parameter_emowse. Since emowse has different parameters as Mascot I had to extend the SearchParameters object. Therefore I extended the Module SearchParameters in prodb/share/perl/PRODB/DB_Server with two new functions partialsFactor and weightVariability. If these functions get a parameter, this value will be added to the SearchParameters object, otherwise the existing value will be returned. Afterwards the search is done for every selected scan. This is done in severable steps which will be explained in the next section.

The Search

Creating an Inputfile

Emowse searches a protein database for matches with the mass spectrometry data for a given file of molecular weights. This file is created in the function createmowseInput. The masses of every single peak of the given scan are printed line per line in a file. The file's name is composed of input and the id of the scan. The file is temporary saved in the /vol/tmp directory.

Creating the emowse call

For every SearchParameters object a new DBSearch object is created. Then the emowse call has to be created. This is done in the function create_submitcall_emowse. The informtion from the searchform is adjusted to the sytnax of an emowse call. In addition , a filename for the resultfile is indicated which is composed of output, the id of the scan and the id of the SearchParameters object. Furthermore, the call and some necessary information are stored in a hash. This hash is stored in an array. After all emowse calls for every scan are created, all calls are submitted to a cluster via the module SubmitEmowse. To read more about this module, see ProDBWiki/DeveloperDocumentation/Cluster.

Parsing the Results

The results of the emowse search are located in an outputfile, temporary stored in the directory /vol/tmp. To get the results, this file has to be parsed for the hits and their additional information. This is done for every scan in the function parse_mowse. The protein hits with start and stop position, sequence and weight of each peptid hit are stored in an array. To get the sequence coverage of the hit, the function seqCoverage is called. There, the length of the matched sequence is calculated. These informations and the unmatched weights are stored in a hash.

Adding the hits to ProDB

The hits are added to ProDB in the function addHits. If the hit has a score higher than 0.3 (Experience has shown that hits with a score less than 0.3 are not significant.) a protein hit is added to ProDB if it dos not already exists. In addition every peptide hit for the protein hit are added to ProDB as well.

The Results

When the search is finished, a new web-page is loaded with the number of results for all searches. To see the results, you have to click on Browse results. Now, the function results of msresults.cgi (same directory) is called. In this function I just changed some details. As the emowse has some parameters other than Mascot (and the other way round), there has to be a differentiation between the two searchengines. So the tooltips vary depending on the searchengine. To get the results, the function fetchMSResults of MSTools is called. Here has to be a differentiation of the two searchengines as well. As mentioned above, the two have different parameters. Therefore, depending on the searchengine, different values have to be fetched, stored in a hash and presentated to the user.

Author: Anna-Lena Kranz