ProDBWiki/DeveloperDocumentation/reduceSearches: Difference between revisions

From BRF-Software
Jump to navigation Jump to search
imported>NicoleChaux
No edit summary
m (6 revisions)
 
(3 intermediate revisions by one other user not shown)
Line 4: Line 4:
== Idea ==
== Idea ==


To get good results with a mascot search, you choose many different search parameters. For example more than one database, 0 - 5 or more missed cleavage sites, a wide spectra for the peptide tolerance and small steps (50-100pmm in 10ppm steps). With every parameter more, the number of searches increases. Rapidly you have 100 or more searches for one scan. And if you want to analyse more than one scan, you can wait very long for your results.
To get good results with a mascot search, you choose many different search parameters. For example more than one database, 0 - 5 or more missed cleavage sites, a wide spectra for the peptide tolerance and small steps (50-100ppm in 10ppm steps). With every parameter more, the number of searches increases. Rapidly you have 100 or more searches for one scan. And if you want to analyse more than one scan, you can wait very long for your results.


If you choose the ''Reduce Searches'', you can decrease the number of searches. This function tries which parameters give the best results and minimizes the searches. At first it only iterates over the missed cleavage sites and saves the best one. Then it iterates over the peptide tolerance (the missed cleavage site is always the best one) and saves the best combination from missed cleavage site and peptide tolerance. For all other searches these parameters are used and only the remaining parameters are iterated. By the use of this tool, you can reduce the number of searches to less than 10.  
If you choose the ''Reduce Searches'', you can decrease the number of searches. This function tries which parameters give the best results and minimizes the searches. At first it only iterates over the missed cleavage sites and saves the best one. Then it iterates over the peptide tolerance (the missed cleavage site is always the best one) and saves the best combination from missed cleavage site and peptide tolerance. For all other searches these parameters are used and only the remaining parameters are iterated. By the use of this tool, you can reduce the number of searches to less than 10.  
Line 13: Line 13:


== What Did I Do ==
== What Did I Do ==
I add two functions to ''msanalysis.cgi'': '''create_searchp_reduce''' and '''reduce_searches'''. Both functions are called from msanalysis.cgi, after checking if ''Reduce Searches'' was selected in the search from. The first function ''create_searchp_reduce'' sorts the searchParameters in a special way, so that they can be used for the second function ''reduce_searches''. All searchParams are stored in the hash %sortSearchDB, this hash has the different selected databases as keys and is returned. %sortSearchDB contains hashes (%sortSearchP) which have the different peptide tolerances as keys. If the key from %sortSearchDB is database A, the hash %sortSearchP contains only search parameters with database A as attribute. %sortSearchP also contains hashes (%sortSearchC) which have different missed cleavage site as keys. %sortSearchC contains only an array (@sortC) with all search parameters with the same database (key from %sortSearchDB), same peptide tolerance (key from %sortSearchP) and same missed cleavage site (key from %sortSearchC). The search parameters are sorted in the order: DB -> PeptidToleranz -> MissedCleavageSites.
For example: You have database DB A and DB B, peptidetolerance 50, 75, 100ppm and missed cleavage sites 1, 2, 3  and different fixed and variable modifications. Than you get the following hash %sortSearchDB (SP = search parameter, DB = database, P = peptide tolerance, C = missed cleavage):
<pre><nowiki>
line              Key    contains      Key  contains      Key  contains  contains
1    sortSearchDB: DB A  sortSearchP:  50  sortSearchC:  1    @sortC:    SP with: DB A, P 50, C 1, and different fixed and variable modifications
2                                                          2    @sortC:    SP with: DB A, P 50, C 2, and different fixed and variable modifications
3                                                          3    @sortC:    SP with: DB A, P 50, C 3, and different fixed and variable modifications
4                                        75  sortSearchC:  1    @sortC:    SP with: DB A, P 75, C 1, and different fixed and variable modifications
5                                                          2    @sortC:    SP with: DB A, P 75, C 2, and different fixed and variable modifications
6                                                          3    @sortC:    SP with: DB A, P 75, C 3, and different fixed and variable modifications
7                                        100  sortSearchC:  1    @sortC:    SP with: DB A, P 100, C 1, and different fixed and variable modifications
8                                                          2    @sortC:    SP with: DB A, P 100, C 2, and different fixed and variable modifications
9                                                          3    @sortC:    SP with: DB A, P 100, C 3, and different fixed and variable modifications
10                  DB B  sortSearchP:  50  sortSearchC:  1    @sortC:    SP with: DB B, P 50, C 1, and different fixed and variable modifications
11                                                          2    @sortC:    SP with: DB B, P 50, C 2, and different fixed and variable modifications
12                                                          3    @sortC:    SP with: DB B, P 50, C 3, and different fixed and variable modifications
13                                      75  sortSearchC:  1    @sortC:    SP with: DB B, P 75, C 1, and different fixed and variable modifications
14                                                          2    @sortC:    SP with: DB B, P 75, C 2, and different fixed and variable modifications
15                                                          3    @sortC:    SP with: DB B, P 75, C 3, and different fixed and variable modifications
16                                      100  sortSearchC:  1    @sortC:    SP with: DB B, P 100, C 1, and different fixed and variable modifications
17                                                          2    @sortC:    SP with: DB B, P 100, C 2, and different fixed and variable modifications
18                                                          3    @sortC:    SP with: DB B, P 100, C 3, and different fixed and variable modifications
</nowiki></pre>
The second function ''reduce_searches'' gets the hash from the previous function as parameter. And is responsible for recducing the number of searches. First the best Score for a missed cleavage site is searched, therefore only the search parameters are used which have the same attributes and only the number of missed cleavage sites is iterated, the score for the searches are compared and the cleavage site Cbest with the best score is saved (for example: as search parameters are used the first three rows from the example above (DB A, P 50, C is iterated). After that, the best Peptid Tolerance Pbest is selected with the same procedure, only, that they use the best missed cleavage site and don't iterate that parameter anymore (for example: the best C is 2, now we use the lines 5 and 8, if one of the scores is bigger than the score from Cbest, the peptide tolerance from that search is set as Pbest otherwise we take the peptide tolerance from Cbest (line 2)). Than the different databases are iterated with the Cbest and Pbest (for example: Pbest 75 we search with the parameters from line 14). At last all other attributes are iterated, but they always have the best cleavage site (Cbest) and the best tolerance (Pbest) as attributes.


Author: Nicole de la Chaux
Author: Nicole de la Chaux

Latest revision as of 07:17, 26 October 2011

Reduce Searches

Idea

To get good results with a mascot search, you choose many different search parameters. For example more than one database, 0 - 5 or more missed cleavage sites, a wide spectra for the peptide tolerance and small steps (50-100ppm in 10ppm steps). With every parameter more, the number of searches increases. Rapidly you have 100 or more searches for one scan. And if you want to analyse more than one scan, you can wait very long for your results.

If you choose the Reduce Searches, you can decrease the number of searches. This function tries which parameters give the best results and minimizes the searches. At first it only iterates over the missed cleavage sites and saves the best one. Then it iterates over the peptide tolerance (the missed cleavage site is always the best one) and saves the best combination from missed cleavage site and peptide tolerance. For all other searches these parameters are used and only the remaining parameters are iterated. By the use of this tool, you can reduce the number of searches to less than 10.

How to use

To use this feature select the field Reduce Searches from the dropdown menu Search for mixture. All other parameters can be chosen as normal. Click on Search to submit the search. After all searches are finished, you can click on Results to get the results. Be aware that not all searches are generated only a few with the best parameters (so you can see only the results for these).

What Did I Do

I add two functions to msanalysis.cgi: create_searchp_reduce and reduce_searches. Both functions are called from msanalysis.cgi, after checking if Reduce Searches was selected in the search from. The first function create_searchp_reduce sorts the searchParameters in a special way, so that they can be used for the second function reduce_searches. All searchParams are stored in the hash %sortSearchDB, this hash has the different selected databases as keys and is returned. %sortSearchDB contains hashes (%sortSearchP) which have the different peptide tolerances as keys. If the key from %sortSearchDB is database A, the hash %sortSearchP contains only search parameters with database A as attribute. %sortSearchP also contains hashes (%sortSearchC) which have different missed cleavage site as keys. %sortSearchC contains only an array (@sortC) with all search parameters with the same database (key from %sortSearchDB), same peptide tolerance (key from %sortSearchP) and same missed cleavage site (key from %sortSearchC). The search parameters are sorted in the order: DB -> PeptidToleranz -> MissedCleavageSites. For example: You have database DB A and DB B, peptidetolerance 50, 75, 100ppm and missed cleavage sites 1, 2, 3 and different fixed and variable modifications. Than you get the following hash %sortSearchDB (SP = search parameter, DB = database, P = peptide tolerance, C = missed cleavage):

line               Key    contains      Key  contains      Key  contains   contains
1     sortSearchDB: DB A   sortSearchP:  50   sortSearchC:  1    @sortC:    SP with: DB A, P 50, C 1, and different fixed and variable modifications
2                                                           2    @sortC:    SP with: DB A, P 50, C 2, and different fixed and variable modifications
3                                                           3    @sortC:    SP with: DB A, P 50, C 3, and different fixed and variable modifications

4                                        75   sortSearchC:  1    @sortC:    SP with: DB A, P 75, C 1, and different fixed and variable modifications
5                                                           2    @sortC:    SP with: DB A, P 75, C 2, and different fixed and variable modifications
6                                                           3    @sortC:    SP with: DB A, P 75, C 3, and different fixed and variable modifications

7                                        100  sortSearchC:  1    @sortC:    SP with: DB A, P 100, C 1, and different fixed and variable modifications
8                                                           2    @sortC:    SP with: DB A, P 100, C 2, and different fixed and variable modifications
9                                                           3    @sortC:    SP with: DB A, P 100, C 3, and different fixed and variable modifications

10                  DB B   sortSearchP:  50   sortSearchC:  1    @sortC:    SP with: DB B, P 50, C 1, and different fixed and variable modifications
11                                                          2    @sortC:    SP with: DB B, P 50, C 2, and different fixed and variable modifications
12                                                          3    @sortC:    SP with: DB B, P 50, C 3, and different fixed and variable modifications

13                                       75   sortSearchC:  1    @sortC:    SP with: DB B, P 75, C 1, and different fixed and variable modifications
14                                                          2    @sortC:    SP with: DB B, P 75, C 2, and different fixed and variable modifications
15                                                          3    @sortC:    SP with: DB B, P 75, C 3, and different fixed and variable modifications

16                                       100  sortSearchC:  1    @sortC:    SP with: DB B, P 100, C 1, and different fixed and variable modifications
17                                                          2    @sortC:    SP with: DB B, P 100, C 2, and different fixed and variable modifications
18                                                          3    @sortC:    SP with: DB B, P 100, C 3, and different fixed and variable modifications

The second function reduce_searches gets the hash from the previous function as parameter. And is responsible for recducing the number of searches. First the best Score for a missed cleavage site is searched, therefore only the search parameters are used which have the same attributes and only the number of missed cleavage sites is iterated, the score for the searches are compared and the cleavage site Cbest with the best score is saved (for example: as search parameters are used the first three rows from the example above (DB A, P 50, C is iterated). After that, the best Peptid Tolerance Pbest is selected with the same procedure, only, that they use the best missed cleavage site and don't iterate that parameter anymore (for example: the best C is 2, now we use the lines 5 and 8, if one of the scores is bigger than the score from Cbest, the peptide tolerance from that search is set as Pbest otherwise we take the peptide tolerance from Cbest (line 2)). Than the different databases are iterated with the Cbest and Pbest (for example: Pbest 75 we search with the parameters from line 14). At last all other attributes are iterated, but they always have the best cleavage site (Cbest) and the best tolerance (Pbest) as attributes.

Author: Nicole de la Chaux