GenDBWiki/WebDocumentation/DialogWindows/PatScan: Difference between revisions
imported>AlexanderLenhardt No edit summary |
m (22 revisions) |
||
(15 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
__NOTOC__ | __NOTOC__ | ||
= [[PatScan]] = | = [[PatScan]] = | ||
PatScan is used to identify patterns in large sequences. It is possible to search patterns by specifying the number of insertions, | PatScan is used to identify patterns in large sequences. It is possible to search patterns by specifying the number of insertions, deletions and substitutions or by defining a weight-matrix. | ||
[[Image:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$ | [[Image:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan_a.png]] | ||
The interface consists of 4 categories: | The interface consists of 4 categories: | ||
{| border="1" cellpadding="2" cellspacing="0" | {| border="1" cellpadding="2" cellspacing="0" | ||
| '''Options''' | | '''Options''' | ||
| Define the type of regions to be searched | | Define the type of regions to be searched (DNA or Protein) | ||
|- | |- | ||
| | | | ||
Line 32: | Line 32: | ||
|- | |- | ||
| | | | ||
| or behind a region,like | | or behind a region,like promotors. So here you can define whether all matches should be displayed, only matches which lie within the bounds of | ||
|- | |- | ||
| | | | ||
Line 51: | Line 51: | ||
== Interpretation of Results == | == Interpretation of Results == | ||
[[Image:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$ | [[Image:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan_b1.png]] | ||
The result shows the name of the region which the match belongs to, the position in the contig and the distance to the start of the region. | The result shows the name of the region which the match belongs to, the position in the contig and the distance to the start of the region. | ||
Every row has on its far right a link to the [[GenDBWiki/WebDocumentation/DialogWindows/AnnotationDialog|AnnotationDialog]]. Sometimes its handy to annotate few matches at once without clicking on every rows annotate link. | |||
For that reason there is a button called '''"Annotate selection"''' which opens the [[GenDBWiki/WebDocumentation/DialogWindows/MultiAnnotator|MultiAnnotator]] to annotated many regions at once. | |||
We specified a offset of 100 bases and filtered all matches which lie within the bounds of a region. As a result we get all intergenic matches which start at maximum 100 bases infront of a region. This can be very handy for identifying promotor regions. | |||
== Exporter for Results == | |||
[[Image:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan_b2.png]] | |||
'''This feature is available since GenDB version 2.4.''' | |||
The exporter for the results of the pattern search is shown after clicking the '''"Show exporting fields"''' button. This button is only visible next to the '''"Annotate"''' button, if your status allows you to export data. | |||
You can choose the values, which should be exported, by clicking in the associated checkboxes. | |||
Beneath this, there is a checkbox for adding a ''comment line'' to the export-file, which contains the name of the checked values. | |||
You also can choose the type of separation between '''"tabulator"''' (recommended) and '''"comma"'''. | |||
By clicking the '''"Export values"''' button, the export-file will be generated and a save dialog will be opened. | |||
== Q/A Troubleshooting == | |||
'''Q: My pattern was not found. It was expected to be an intergenic match.''' | |||
A: You possibly specified the offset too low. Imagine there exist 2 regions which are divided by a 100 bases raw sequence which has no region there. | |||
If your offset reads e.g. 30 bases upstream you loose the the information if the pattern occurs in the remaining 70 bases sequence. | |||
So, setting your offset higher may solve your problem | |||
'''Q: Why does the result always show my selected contig as associated region?''' | |||
A: You didn't select a region and selected '''"current"''' as '''Search Region'''. As a result, PatScan searches only the contig (which is also a region). | |||
To solve this problem select '''all''' as Search Region and select an applicable type from the listbox. A value of '''Region''' will search all regions except the contig. |
Latest revision as of 07:18, 26 October 2011
PatScan
PatScan is used to identify patterns in large sequences. It is possible to search patterns by specifying the number of insertions, deletions and substitutions or by defining a weight-matrix.
File:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan a.png
The interface consists of 4 categories:
Options | Define the type of regions to be searched (DNA or Protein) |
Define search strand | |
Defined the offset which is used to search also characters infront or behind the actual search-region | |
Search definition | Enter your search pattern here. |
Optionally you can check the "use" checkbox to define the insertions/deletions/substitutions if you have not already defined it in your pattern | |
Above the textfield are two links. "Ambiguity codes" shows you the available IUPAC codes which can be incorporated into your pattern | |
The second link "Examples" shows you some common example queries for a quick start building your own search patterns | |
Filter | GenDB tries to associate regions with the PatScan matches. Sometimes you are only interested in matches which are infront |
or behind a region,like promotors. So here you can define whether all matches should be displayed, only matches which lie within the bounds of | |
a region or only intergenic matches. | |
PatScan | These values should normally not be touched. Only if performance is an issue and you know what you are doing change these values. |
Mainly these are commandline options for the scan_for_matches (PatScan) program. For a better description please take a closer look at the PatScan manual |
How does the search work
Basically GenDB fetches all regions of the type you specified in the options section and runs PatScan for each region. By default only upstream sequences are searched. But you can override this behaviour by selecting the "both strands" radiobox. The result shows the region marked as big red bars and the PatScan matches are shown as small yellow bars.
Interpretation of Results
File:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan b1.png
The result shows the name of the region which the match belongs to, the position in the contig and the distance to the start of the region. Every row has on its far right a link to the AnnotationDialog. Sometimes its handy to annotate few matches at once without clicking on every rows annotate link. For that reason there is a button called "Annotate selection" which opens the MultiAnnotator to annotated many regions at once.
We specified a offset of 100 bases and filtered all matches which lie within the bounds of a region. As a result we get all intergenic matches which start at maximum 100 bases infront of a region. This can be very handy for identifying promotor regions.
Exporter for Results
File:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan b2.png
This feature is available since GenDB version 2.4.
The exporter for the results of the pattern search is shown after clicking the "Show exporting fields" button. This button is only visible next to the "Annotate" button, if your status allows you to export data.
You can choose the values, which should be exported, by clicking in the associated checkboxes. Beneath this, there is a checkbox for adding a comment line to the export-file, which contains the name of the checked values. You also can choose the type of separation between "tabulator" (recommended) and "comma".
By clicking the "Export values" button, the export-file will be generated and a save dialog will be opened.
Q/A Troubleshooting
Q: My pattern was not found. It was expected to be an intergenic match.
A: You possibly specified the offset too low. Imagine there exist 2 regions which are divided by a 100 bases raw sequence which has no region there. If your offset reads e.g. 30 bases upstream you loose the the information if the pattern occurs in the remaining 70 bases sequence. So, setting your offset higher may solve your problem
Q: Why does the result always show my selected contig as associated region?
A: You didn't select a region and selected "current" as Search Region. As a result, PatScan searches only the contig (which is also a region). To solve this problem select all as Search Region and select an applicable type from the listbox. A value of Region will search all regions except the contig.