When thinking of a bioInformatics service provider like the BiBiServ, one must have in mind that the data to be analysed is very huge and is growing day by day. The problem is not only to provide the necessary computing power to analyze all the data but also provide storage and bandwidth capacity. One possible solution to solve this problem is to move the bioinformatic tools itself to the data instead of moving the big amount of data to the tools. In other words: if your data is already stored in the cloud it is faster (and normally also cheaper) to run your bioinformatics applications near to the data instead of moving/copying all the data to your analyze pipeline.
The BiBiCloud consists of two independently usable pieces of software supporting each other.
Since our local computing grid uses OGE (Open Grid Engine, previously known as SGE or Codine) as batch-queuing system and the BiBiServ framework plus most of our standalone pipelines are based on it, we need a running OGE installation when running them in the cloud. In 2012, when starting searching for a good solution for an easy setup of a computing grid based on OGE within AWS EC2 environment, it was disappointing. On the one hand existing solutions were bad documented and on the other hand most of them needed a lot of manual configuration to run. Since we have some experiences to setup and administer the OGE (Oracle Grid Engine) on our local compute cluster, we decided to develop our own tool. Starting only with the GridEngine as job scheduling system and Amazon AWS as Cloud provider, BiBiGrid nowadays supports additional Mesos as scheduling framework and is (theoretical) independent from any cloud provider. See BiBiGrid page for a detailed description and downloadable informations.
Amazon S3 is a cloud storage service for large amounts of data. Before EC2 compute instances can process that data, they need a fast way to get to it. Our BiBiS3 module provides very efficient and scalable approaches to transfer data both to and from S3. The features we implemented are parallel transfer of multiple chunks of data, recursive transfer of whole directories with parallelisation of multiple file downloads, and simultaneous, parallel download via a cluster to either an NFS-shared directory, or split downloads of only smaller chunks of files.
See BiBiS3 page for a detailed description and downloadable informations.
The AMIs we provide for the usage with BiBiGrid contain an up-to-date preconfigured stable release of the BiBiServ framework. Together with the possibility of BiBiGrid to run an (early) user script during server set up, the BiBiServ framework can be set up easily.