ProDBWiki/DeveloperDocumentation/Cluster

From BRF-Software
Jump to navigation Jump to search

SubmitEmowse.pm

Idea

Most of the time in proteomics much data is analysed at once and the results are needed fast. So when hundreds of samples are examined one after another, one have to wait a long time for results. But if the work is parallelised, one would get the results much faster. Since emowse works for every sample and every set of parameters independently, all searches can be done at the same time. For this reason, the searches are collected and submit to a cluster. This is realised in the module SubmitEmowse.pm in prodb/share/www/cgi-bin.

Implementation

The module has only one function of interest: submitThemAll. The function expects a reference on an array with all emowsecalls. These calls are seperated in groups of 200 calls. If more than 200 jobs are calculated at the same time, there is no advantage on time any more. So only an arrayjob of at most 200 calls is submitted. To create an array job and submit it to the cluster, I use the existing module Scheduler::Codine.pm in bioinfo/common/share/perl. In order to submit the jobs not as single jobs but alltogether, the scheduler has to be freezed. Then every job can be submitted and after all jobs have been collected, the arrayjob can be run.

Scheduler::Codine->freeze();
foreach emowsecall {
      my $job = Scheduler::Codine->new();
      #extra option for the scheduler: do the job immediately
      $job->options("-now y");
      $job->command(emowsecall);
      $job->submit();
}
Scheduler::Codine->thaw();

By parallelising the emowsesearches, your work is done in at least half the time.

Author: Anna-Lena Kranz