scitubepro

BLAST, fascinating bioinformatic tool and its uses in molecular biology.

Bioinformatics can be simply defined as a bridge between computational and biological sciences, which provide a computational platform to organize, store/ archive, retrieve and analyze vast amount of biological information generated by newly emerging molecular biological techniques. Bioinformatic databases and tools are used frequently in molecular biology, specially in genomic and proteomic technologies. However, those tools provide the facility to retrieve and analyze biological data stored in biological molecules such as nucleic acid or protein sequences, deposited in databases and interpret them statistically.

Basic local alignment tool or BLAST is a sequence alignment tool which was developed by National center for biotechnology information (NCBI) by S. Altschul et al. and it is a freely available tool on NCBI website. BLAST contain several advantages over conventional sequence alignment tools, such as speed, accuracy, sequence search over several types of databases, results of BLAST are shown in easily understandable graphical and tabulated format etc. Since large amount of nucleic acid and protein sequences are revealed with the development of high-throughput sequencing techniques, BLAST has become a popular tool which is used to search for similar sequences, in order to build functional, structural and evolutionary relationships as well as to find gene and protein families that which the novel sequence belongs to. BLAST matches contiguous subsections of sequences rather performing an end to end alignment and therefore, more suitable in aligning sequences that share a conserved domain.

Sequences that are used to perform BLAST, which is also known as query sequence can input to the BLAST program using either GenBank or FASTA formats. Then these query sequences are searched against nucleic acid or protein databases to find similar target sequences and the results are interpreted in a statistical manner which provides an ease in comparison. However, depending on the query/ input sequence, and type of database being searched, NCBI BLAST contain several programs:

ProgramType of query sequenceType of database being searched
BLASTNNucleotideNucleotide database
BLASTPProteinProtein sequence of protein structure database
BLASTXTranslated nucleotideProtein database
TBLASTNProteinTranslated nucleotide database
TBLASTXTranslated nucleotideTranslated nucleotide database

Table: BLAST programs for different query sequences.

In the process of optimal alignment by BLAST, gaps are introduced. These gaps can be either insertions or deletions (also known as indels) with regard to the common ancestor sequence, which provide information about mutations or evolutionary changes occur in homologous sequences

By conducting similarity searches of novel sequences, scientists have found that many organisms share similar genes, have similar functions as well. On the basis of this finding, they have used BLAST to predict the gene functions of novel sequences revealed from new organisms, before performing any further experiments to prove them. For that there are several model organisms (such as bacterium Escherichia coli) which are thoroughly studied by scientists and found out almost all the genetic elements and their respective functions are used.

By conducting similarity searches of novel sequences, scientists have found that many organisms share similar genes, have similar functions as well. On the basis of this finding, they have used BLAST to predict the gene functions of novel sequences revealed from new organisms, before performing any further experiments to prove them. For that there are several model organisms (such as bacterium Escherichia coli) which are thoroughly studied by scientists and found out almost all the genetic elements and their respective functions are used.

Despite of function prediction, identifying the gene locations and consensus regulatory patterns provides information for gene mapping. Also, sequences similar to the query sequence can be used to design primers to be used in PCR experiments

One of the main uses of BLAST in terms of molecular biology, is finding sequence similarities to provide information in building phylogenetic trees. Sequence similarity score is used to determine the closely related species to the interested organism. Therefore, phylogenetic analysis reflects evolutionary history of an organism, relationship and evolutionary distance between species. Speciation occur due to mutation and natural selection can be interpreted in the phylogenetic tree in relationship to common ancestor. However, speciation occurs by small changes in the ancestral sequence, and most of the positions in the sequences are conserved as shown in the figure below.  Such sequences with common ancestry and function are known as homologous sequences. Therefore, by performing BLAST for sequences of organisms of different species provide information to determine their ancestry as well

Figure: Origin of similar sequences from common ancestral sequence.

Source: W. Mount, D. Bioinformatics Sequence and Genome Analysis; Cold Spring Harbor Laboratory Press; p 239.

However, evolutionary relationship can build using nucleic acid sequences, only when vertical gene transfer occurs. If the horizontal gene transfer events like symbiosis, viral induced gene transfer occur between unrelated organisms, then evolutionary relationship predictions may become unreliable.

Author: Thaanya Amarasekara
B.Sc. (Honors degree in Biochemistry and Molecular Biology)
Undergraduate
Faculty of Science
University of Colombo

References:

  • W. Mount, D. Bioinformatics Sequence and Genome Analysis; Cold Spring Harbor Laboratory Press; p 20- 335
  • BLAST: Basic Local Alignment Search Tool. https://blast.ncbi.nlm.nih.gov/Blast.cgi (accessed Jun 21, 2020).

scitubepro

Add comment