Grid Enabled BLAST (GEB) with Distributed Objects and Components



Sequence comparison algorithms and tools, and especially BLAST (Basic Local Alignment Search Tool), is one of the corner stones of bioinformatics. However, sequence databases are exploding in size, growing at an exponential rate --- currently doubling in about 14 months, exceeding Moore's Law for hardware which is about 18 months. Grid computing together with smart parallel BLAST, is a crucial technique to maintain and improve the effectiveness of sequence comparison.

The objective of this research Master is to define models, techniques, and tools for Grid Enabled BLAST (GEB).1.

There are various software methods by which BLAST can be parallelized to achieve linear or even super-linear speedups: A first constraint is to work with NCBI standard for BLAST. One cannot afford to become incompatible with the constant flux of NCBI updates.

In order to achieve maximum flexibility and efficiency, we shall aimed at an hybrid approach, using both Database and Query Segmentation, depending on the problem to BLAST, and the available resources. Load-balancing and fault-tolerance are clearly two other issues to take into account. Also, the model and infrastructure being developed must be as flexible, as configurable, and as generic as possible. They must use modern distributed techniques, such as components.

The OASIS INRIA Sophia Antipolis project has been designing and implementing  ProActive, a Java library for Parallel, Distributed, and Concurrent computing and GRID. In the framework of a MIMD model (active object model), from a reduced set of rather simple primitives, a comprehensive and versatile library is defined for parallel, distributed, and concurrent programming (middleware).
In the absence of any syntactical extension, ProActive programmers write standard code. The library is itself extensible by the programmers, making the system open for adaptations and optimizations.
The prototype implementation and experimentations will take advantage of the ProActive platform, including the component capability, in order to benefit from asynchronous and group communicationd, high-level synchronizationd (wait-by-necessity with Future pool), and graphical composition.

Overall, the work will include conceptual models and practical experiments. The following steps could be undertaken:

Note:
It might be considered to design an infrastructure that is valid to encapsulate other alignment tools, such as FASTA, FASTX, TFASTX, for instance. In that case, any specific concatenation would have to be achieved outside any alignment tool, but rather probably in a alignment-tool-specific part of the GEB system.

This research should lead to a PhD program (Thèse de Doctorat).


Advisor :  Denis Caromel
Téléphone : 04 92 38 76 31 Email : Denis.Caromel@inria.fr
Laboratoire ou équipe : INRIA Sophia Antipolis -- I3S -- CNRS

In collaboration with : 

Richard Christen, Claude Pasquier
Laboratoire de Biologie Virtuelle, UMR6543 CNRS - UNSA

Pascal Barbry
IPMC, Institut de Pharmacologie Moléculaire et Cellulaire, UMR 6097 CNRS/UNSA



Prerequisite : Some knowledge in Object-oriented languages, Distributed Programming, Bioinformatics.

Hardware and software to be used : Networks of PCs, Intranet and Internet P2P machines

Internship place:  Sophia Antipolis, between Nice and Cannes, France  



Bibliography:
     
S. Altschul, W. Gish, W. Miller, E. Myers and D.J. Lipman,      
Basic local alignment search tool      
Journal of Molecular Biology, 215:403-410, 1990.
     
S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman:      
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs      
Nucleic Acids Res., 25:3389-3402, 1997.
     
NCBI BLAST
     
FASTA
     
R. Bjornson, A. Sherman, S. Weston, N. Willard and J. Wing.      
Turboblast: A parallel implementation of blast based on the turbohub process integration architecture      
In IPDPS 2002 Workshops, April 2002.      
Note: TurboBLAST is a closed-source commercial parallelization of NCBI BLAST by TurboWorx, Inc.
     
R. Braun, K. Pedretti, T. Casavant, T. Scheetz, C. Birkett, and C. Roberts:      
Parallelization of local BLAST service on workstation clusters      
Future Generation Computer Systems      
17(6)}:745-754, April 2001.      
Note: mpiBLAST is an open-source parallelization of BLAST using Message Passing Interface (MPI)
     
J. D. Grant, R. L. Dunbrack, F. J. Manion and M. F. Ochs:      
BeoBLAST: distributed BLAST and PSI-BLAST on a Beowulf cluster      
Bioinformatics, 18(5)}:765-766, 2002.      
Note:BeoBLAST is an integrated software package for Linux Beowulf cluster using Perl script and daemon.
     
M. Dumontier, and Christopher WV Hogue      
NBLAST: a cluster variant of BLAST for NxN comparisons      
BMC Bioinformatics, 3:13, 2002,      
Note: NBLAST is a cluster variant of BLAST for N X N comparisons, using C/C++ and Perl.
     
Grid-Aware GridBLAST System      
Note: a web-based system for NCSA Linux Cluster or NCSA SGI origin 2000.
     
GridBlast      
Note: a grid-enabled, high-throughput ``task farming'' application

1. if you wanna see a GEB, god of the earth, father of Osiris and Isis in mythological Egypt.