ProDBWiki/DeveloperDocumentation/ReduceSearchSpace

From BRF-Software
Revision as of 07:14, 26 October 2011 by Admin (talk | contribs) (22 revisions)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Reduce Search Space

Idea

You have mass spectra data and you want to do a search against a database to get the possible names of the protein. On the Mass Spectra Data: MS Query site you can choose between different search parameters and which search tool you want to use (mascot or emowse). If you want to get a better score (for example, when no significant hit is found) you can reduce the search space. That means, at first you do a "normal" search against a database of your choice. After that, the program automatically generates a new database (name: prodb-tmp) with the sequences of the hits from the first search. Then a new search is created with the same search parameters as before only the database changes. Now the programm searches against the new generated database (prodb-tmp). After all searches are completed, you can view the results.

How To Use

When you are on the "Mass Spectra Data: Search" site, you can choose the scan for the search. After clicking on "Peptide Mass Fingerprint" you are forwarded to the search form (Mass Spectra Data: MS Query). Here you can choose your search parameters. For reducing the search space, you have to select reduce search space from the dropdown menu Search for mixture. Also you have to choose how many hits are allowed, this has to be at least ten (out of these hits the new database is generated). Above the dropdown menu you can choose if you want the reduce always or only when no significant hits are found. All the other search parameters can be choosen like normal. After clicking on Search, the search will be submitted. After the search you can see the results if you click on results. The search space reduce takes place for every search parameter.

What did I do

At first I changed the graphical user interface in msanalysis.cgi. After that, I wrote a new function create_search_for_reduce in msanalysis.cgi, so that a new database can be created after the normal search. At first the function gets the sequences, description and ids from the hits of the previous search. All these informations are stored in a temporary file /vol/tmp/db.fasta. Before the temporary file is saved, the programm checks if it already exists. If it does, it means, some other user is using this function at the moment and we have to wait. That means the programm will sleep for 15 seconds and after that it tries it again. Up to ten tries are generated, after that the function returns and no database was generated. When the temporary file doesn't exists, it will be saved and the database file is send to mascot via HTTP and saves it there. The function on mascot site is: copyDB.cgi (more information about this function, see below). Because of the time mascot needs to build the new database, create_search_for_reduce has to sleep for 20 seconds. If mascot returns the path to the new database, the request was a successes and a search can be generated on the database. Therefor the dbSearch-Object is copied and the database is changed in the new object. A new search can be generated. After the search, the temporary file db.fasta is removed, so that another user can use this function.

One problem with this function is, that when a error occures in create_search_for_reduce and the temporary file would already created, this file is not deleted and everytime someone tries to use this function, he has to wait. Therefore the function checks, how long the file exists. If it is longer than a day, it will be deleted. You can read in the error-log, when the function is waiting and if it happens a few times, you can delete the file directly in /vol/tmp/. But be careful that nobody else is using this function at the moment.

copyDB.cgi

This script is on the mascot server. Its divided into two functions: set_database and get_database. In this case we need the function set_database because we send the new database to mascot, so that we can use it. copyDB.cgi is called via HTTP. Here is an example:

my $browser = 'Mozilla/4.77C-CCK-MCD  [en] (X11; U; SunOS 5.8 sun4u; Nav)';
my @request_data = (Connection => 'Keep-Alive',
    Accept => 'image/gif, image/jpeg, image/png, */*',
    Accept_Charset  => 'iso-8859-1,*,utf-8',
    Accept_Encoding => 'gzip',
    Accept_Language => 'de, en',
    Host => 'mascot',
    User_Agent => $browser,
    Content_Type => "multipart/form-data",
);

# creates a new User Agent
my $ua = LWP::UserAgent::->new();

# create the URL object
my $url = new URI::URL("http://mascot/cgi/copyDB.cgi");

# sends the temporary database to mascot
my $resp = $ua->request(POST $url, @request_data, Content => [file   => ["$file"], set => '1'],);

The important line, is the last one. $file contains the path to the temporary file with the new database. To call the function set_database, you need to set the attribute set to 1. (with setting get to 1 you can call the function get_database).

Further information to copyDB.cgi you find here: ProDBWiki/DeveloperDocumentation/copyDB

Author: Nicole de la Chaux