The Blast on DB project aims to use the Blast Algorithm with a DBMS.
The storage of the bio-molecular sequences, like DNA, RNA and proteins, is made utilizing raw files that are formatted using formats like FASTA or blast's own format. The FASTA format doesnt provide any information about sequences, like index for searches or semantic information, being more a data bank than a data base.
A [good] approach is to use good data bases schemas, like GUS , to store the sequences and their information, but a problem occur when a search by similarity is needed. The sequences must be dumped into a temporary file, this temporary file must be formatted to the blast format and finally, the search can be performed. After this process, the results must be saved into the data base. This process wastes time [and money].
At present time, for me, the best choice is the postgresql DBMS. This is because:
No. At present moment postgresql seems to be the better option, but I can change or preferentially add support to other DBMS, like mysql.
No. For the purpose of this project, it is better to do an extension to postgresql rather than mess with its core.
This is because the main functionality of this project is targeted to a very specific user group and the source code can be packaged as a module. For maintenance, coding and debugging, this approach is really better.
No. This projects wants to reuse an existing blast implementation. The two main blast implementations are NCBI and WU.
WU BLAST has a licensing problem: its license doesn't allow free commercial use and changes in the source code: "WU BLAST 2.0 is copyrighted and may not be sold, redistributed or modified in any form or by any means, without prior express written consent from the Office of Technology Management at Washington University in St. Louis.".
The NCBI BLAST apparently is open source, but the license itself I dont know. The source is good, very well organized and tested, but is hard to understand well and to cut the "important" part to create a plugin.
A good choice is the FSA-BLAST. This BLAST implementation uses the BSD license [CHEER!], its authors says that it's faster than NCBI-BLAST and its source is small and concise. I have already tried it and the results are really good! This implementation is your major candidate!
So, the short answer is: no, I want to use the FSA-BLAST source.
Yes. I know the BioPostgres, where it has a similar objective. I mean: "BioPostgres is a collection of modules that extend PostgreSQL for Computational Biology. It implements new datatypes (graph, range, location, etc) with query operators, index and related tools for large-scale analysis.".
One main difference between our project and BioPostgres is that we want to include BLAST into DBMS and they want to do modules for Computational Biology in Postgresql, so they have different objectives and these projects can [or must] work together.
Nothing. Actually, some test sources, but nothing too cool. I am reading postgresql manuals and seeing source examples and choosing a good BLAST implementation to use.
I haven't a better way to communicate, so contact me by my personal email: felipe.albrecht(at)gmail.com or by project page at sourceforge.
Documentation of how to extend SQL in Postgresql
Database Internals Presentation
Felipe Fernandes Albrecht - felipe.albrecht(@)gmail.com
Last update: 06/16/2007
Thank you for your visit and attention.