GenDBWiki/WebDocumentation/DialogWindows/PatScan: Difference between revisions

From BRF-Software
Jump to navigation Jump to search
imported>AlexanderLenhardt
No edit summary
m (22 revisions)
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__NOTOC__
__NOTOC__
= [[PatScan]] =
= [[PatScan]] =
PatScan is used to identify patterns in large sequences. It is possible to search patterns by specifying the number of insertions,deltions and substitutions or by defining a weight-matrix.  
PatScan is used to identify patterns in large sequences. It is possible to search patterns by specifying the number of insertions, deletions and substitutions or by defining a weight-matrix.  


[[Image:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$patscan1.png]]
[[Image:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan_a.png]]


The interface consists of 4 categories:
The interface consists of 4 categories:
{| border="1" cellpadding="2" cellspacing="0"
{| border="1" cellpadding="2" cellspacing="0"
|  '''Options'''  
|  '''Options'''  
|  Define the type of regions to be searched  
|  Define the type of regions to be searched (DNA or Protein)
|-
|-
|   
|   
Line 32: Line 32:
|-
|-
|   
|   
|  or behind a region,like RBS. So here you can define whether all matches should be displayed, only matches which lie within the bounds of  
|  or behind a region,like promotors. So here you can define whether all matches should be displayed, only matches which lie within the bounds of  
|-
|-
|   
|   
Line 51: Line 51:
== Interpretation of Results ==
== Interpretation of Results ==


[[Image:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$patscan2.png]]
[[Image:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan_b1.png]]


The result shows the name of the region which the match belongs to, the position in the contig and the distance to the start of the region.
The result shows the name of the region which the match belongs to, the position in the contig and the distance to the start of the region.
You can see that we specified a offset of 100 bases and filtered all matches which lie within the bounds of a region. As a result we get all intergenic matches which start at maximum 100 bases infront of a region. This can be very handy for identifying RBS regions.
Every row has on its far right a link to the [[GenDBWiki/WebDocumentation/DialogWindows/AnnotationDialog|AnnotationDialog]]. Sometimes its handy to annotate few matches at once without clicking on every rows annotate link.
For that reason there is a button called '''"Annotate selection"''' which opens the [[GenDBWiki/WebDocumentation/DialogWindows/MultiAnnotator|MultiAnnotator]] to annotated many regions at once.
 
We specified a offset of 100 bases and filtered all matches which lie within the bounds of a region. As a result we get all intergenic matches which start at maximum 100 bases infront of a region. This can be very handy for identifying promotor regions.
 
== Exporter for Results ==
 
[[Image:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan_b2.png]]
 
'''This feature is available since GenDB version 2.4.'''
 
The exporter for the results of the pattern search is shown after clicking the '''"Show exporting fields"''' button. This button is only visible next to the '''"Annotate"''' button, if your status allows you to export data.
 
You can choose the values, which should be exported, by clicking in the associated checkboxes.
Beneath this, there is a checkbox for adding a ''comment line'' to the export-file, which contains the name of the checked values.
You also can choose the type of separation between '''"tabulator"''' (recommended) and '''"comma"'''.
 
By clicking the '''"Export values"''' button, the export-file will be generated and a save dialog will be opened.
 
== Q/A Troubleshooting ==
 
'''Q: My pattern was not found. It was expected to be an intergenic match.'''
 
A: You possibly specified the offset too low. Imagine there exist 2 regions which are divided by a 100 bases raw sequence which has no region there.
If your offset reads e.g. 30 bases upstream you loose the the information if the pattern occurs in the remaining 70 bases sequence.
So, setting your offset higher may solve your problem
 
'''Q: Why does the result always show my selected contig as associated region?'''
 
A: You didn't select a region and selected '''"current"''' as '''Search Region'''. As a result, PatScan searches only the contig (which is also a region).
To solve this problem select '''all''' as Search Region and select an applicable type from the listbox. A value of '''Region''' will search all regions except the contig.

Latest revision as of 07:18, 26 October 2011

PatScan

PatScan is used to identify patterns in large sequences. It is possible to search patterns by specifying the number of insertions, deletions and substitutions or by defining a weight-matrix.

File:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan a.png

The interface consists of 4 categories:

Options Define the type of regions to be searched (DNA or Protein)
Define search strand
Defined the offset which is used to search also characters infront or behind the actual search-region
Search definition Enter your search pattern here.
Optionally you can check the "use" checkbox to define the insertions/deletions/substitutions if you have not already defined it in your pattern
Above the textfield are two links. "Ambiguity codes" shows you the available IUPAC codes which can be incorporated into your pattern
The second link "Examples" shows you some common example queries for a quick start building your own search patterns
Filter GenDB tries to associate regions with the PatScan matches. Sometimes you are only interested in matches which are infront
or behind a region,like promotors. So here you can define whether all matches should be displayed, only matches which lie within the bounds of
a region or only intergenic matches.
PatScan These values should normally not be touched. Only if performance is an issue and you know what you are doing change these values.
Mainly these are commandline options for the scan_for_matches (PatScan) program. For a better description please take a closer look at the PatScan manual

How does the search work

Basically GenDB fetches all regions of the type you specified in the options section and runs PatScan for each region. By default only upstream sequences are searched. But you can override this behaviour by selecting the "both strands" radiobox. The result shows the region marked as big red bars and the PatScan matches are shown as small yellow bars.

Interpretation of Results

File:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan b1.png

The result shows the name of the region which the match belongs to, the position in the contig and the distance to the start of the region. Every row has on its far right a link to the AnnotationDialog. Sometimes its handy to annotate few matches at once without clicking on every rows annotate link. For that reason there is a button called "Annotate selection" which opens the MultiAnnotator to annotated many regions at once.

We specified a offset of 100 bases and filtered all matches which lie within the bounds of a region. As a result we get all intergenic matches which start at maximum 100 bases infront of a region. This can be very handy for identifying promotor regions.

Exporter for Results

File:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan b2.png

This feature is available since GenDB version 2.4.

The exporter for the results of the pattern search is shown after clicking the "Show exporting fields" button. This button is only visible next to the "Annotate" button, if your status allows you to export data.

You can choose the values, which should be exported, by clicking in the associated checkboxes. Beneath this, there is a checkbox for adding a comment line to the export-file, which contains the name of the checked values. You also can choose the type of separation between "tabulator" (recommended) and "comma".

By clicking the "Export values" button, the export-file will be generated and a save dialog will be opened.

Q/A Troubleshooting

Q: My pattern was not found. It was expected to be an intergenic match.

A: You possibly specified the offset too low. Imagine there exist 2 regions which are divided by a 100 bases raw sequence which has no region there. If your offset reads e.g. 30 bases upstream you loose the the information if the pattern occurs in the remaining 70 bases sequence. So, setting your offset higher may solve your problem

Q: Why does the result always show my selected contig as associated region?

A: You didn't select a region and selected "current" as Search Region. As a result, PatScan searches only the contig (which is also a region). To solve this problem select all as Search Region and select an applicable type from the listbox. A value of Region will search all regions except the contig.