GenDBWiki/WebDocumentation/DialogWindows/PatScan

From BRF-Software
Jump to navigation Jump to search

PatScan

PatScan is used to identify patterns in large sequences. It is possible to search patterns by specifying the number of insertions, deletions and substitutions or by defining a weight-matrix.

File:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan a.png

The interface consists of 4 categories:

Options Define the type of regions to be searched (DNA or Protein)
Define search strand
Defined the offset which is used to search also characters infront or behind the actual search-region
Search definition Enter your search pattern here.
Optionally you can check the "use" checkbox to define the insertions/deletions/substitutions if you have not already defined it in your pattern
Above the textfield are two links. "Ambiguity codes" shows you the available IUPAC codes which can be incorporated into your pattern
The second link "Examples" shows you some common example queries for a quick start building your own search patterns
Filter GenDB tries to associate regions with the PatScan matches. Sometimes you are only interested in matches which are infront
or behind a region,like promotors. So here you can define whether all matches should be displayed, only matches which lie within the bounds of
a region or only intergenic matches.
PatScan These values should normally not be touched. Only if performance is an issue and you know what you are doing change these values.
Mainly these are commandline options for the scan_for_matches (PatScan) program. For a better description please take a closer look at the PatScan manual

How does the search work

Basically GenDB fetches all regions of the type you specified in the options section and runs PatScan for each region. By default only upstream sequences are searched. But you can override this behaviour by selecting the "both strands" radiobox. The result shows the region marked as big red bars and the PatScan matches are shown as small yellow bars.

Interpretation of Results

File:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan b1.png

The result shows the name of the region which the match belongs to, the position in the contig and the distance to the start of the region. Every row has on its far right a link to the AnnotationDialog. Sometimes its handy to annotate few matches at once without clicking on every rows annotate link. For that reason there is a button called "Annotate selection" which opens the MultiAnnotator to annotated many regions at once.

We specified a offset of 100 bases and filtered all matches which lie within the bounds of a region. As a result we get all intergenic matches which start at maximum 100 bases infront of a region. This can be very handy for identifying promotor regions.

Exporter for Results

File:GenDBWiki$$WebDocumentation$$DialogWindows$$PatScan$PatScan b2.png

This feature is available since GenDB version 2.4.

The exporter for the results of the pattern search is shown after clicking the "Show exporting fields" button. This button is only visible next to the "Annotate" button, if your status allows you to export data.

You can choose the values, which should be exported, by clicking in the associated checkboxes. Beneath this, there is a checkbox for adding a comment line to the export-file, which contains the name of the checked values. You also can choose the type of separation between "tabulator" (recommended) and "comma".

By clicking the "Export values" button, the export-file will be generated and a save dialog will be opened.

Q/A Troubleshooting

Q: My pattern was not found. It was expected to be an intergenic match.

A: You possibly specified the offset too low. Imagine there exist 2 regions which are divided by a 100 bases raw sequence which has no region there. If your offset reads e.g. 30 bases upstream you loose the the information if the pattern occurs in the remaining 70 bases sequence. So, setting your offset higher may solve your problem

Q: Why does the result always show my selected contig as associated region?

A: You didn't select a region and selected "current" as Search Region. As a result, PatScan searches only the contig (which is also a region). To solve this problem select all as Search Region and select an applicable type from the listbox. A value of Region will search all regions except the contig.