Bork Group

Help for the Advanced BLAST2 & Orthologue
Search with post-processing

EMBL

This new approach to find orthologues in the molecular biological databases is based on the database search program Warren Gish's WU-BLAST, SRS5 [Sequence Retrieval System 5.0], and the tree reconciling algorithm developed by R. Page (1994). A choice for multiple local or global alignments is given and the multiple alignments are computed by CLUSTALW. All homologous sequences selected by the user are retrieved. The Neighbour-joining gene tree for all selected sequences is then compared with the species tree of the selected sequences. If incongruent, reconciling both trees is taken out. All trees are given in PHYLIP format as well as in graphics drawed by programs of Felsenstein 's PHYLIP package. According to the reconciled phylogenetic gene tree, one should be able to easily detect the orthologues and paralogues among the interested genes or gene products (proteins).
The whole procedure is done in the following steps:
  1. Searching homologous sequences in the molecular databases
    1. Starting the database search by using your own query sequence for WU-BLAST2
      See an example for the start page.
      Usually the BLAST2 result in html format is returned instead of the plain BLAST2 text result
    2. Finding homologous sequences in the BLAST2 result
    3. Selecting sequences for further analysis
  2. Making "local alignments" or "global alignments"
    1. Local alignments: BLAST produces local alignments. Thus, the hits found by BLAST are usually not in the whole length similar to the query sequence and the sequences often show different matched regions among each other. If this option is choiced, only the matched regions of the selected sequences will be taken into account in further analysis.
    2. Global alignments: The selected sequences in their whole length are retrieved and multiple alignments for them are made. They are also used for computing the Neighbour-joining gene tree.
  3. Entering the Latin organism name for the query sequence if the query sequence should be taken into account in further analysis. This step is optional and only requested if the query sequence itself is selected for making alignments. Sometimes your query sequence is among the hit sequences, too and therefore can be selected in the hit list in order to let our program figure out its organism name.
  4. Evaluating the results in the final page [multiple alignments, species tree, gene tree, and reconciled tree]
    1. A table on all the selected sequences is given for their sequence length, average distances, and their organisms.
    2. A multiple alignment is computed for the selected sequences.
    3. A distance matrix is calculated for all the sequences interested
    4. Species tree for the corresponding species of the selected sequences
      At first the taxonomical species tree is given out; then the genes are sorted according this taxonomical species tree to construct the species tree.
    5. Gene tree for the selected sequences The first tree here is the Neighbour-joining gene trees for the selected sequences. This tree is given out with all the branch lengths and the bootstrapping vlaues. The second tree given for users to easily see the topology of the first tree.
    6. Reconciled tree of the species tree and the gene tree
      Basically, a predicted gene (or gene product) is marked with the prefix "?" to certain organism names.


Biocomputing Unit   Bork Group   Search EMBL   EMBL   EBI   e-mail_to_usYan P. Yuan































































































Bork Group

Help for the Advanced BLAST2 & Orthologue
Search with post-processing

EMBL

Terminology


Orthologues:
Genes or gene products derived from the speciation of the involved species. Thus, orthologues reside in different organisms, and they all have a higher significant similar function than the paralogues.
Paralogues:
Genes or gene products derived from the gene duplication events.
Dist. means:
stands for the arithmetic distance means of all pairwise percentage distances (excluded the distance to the sequence itself) of the same sequence. It is basically calculated from the table for pairwise percentage distances of CLUSTALW [pairwise percentage distance or divergence = (no. of different residues in both sequences)/(sequence length in alignment) * 100]. This measure gives us some hints about how far a sequence is divergent to the others in the analysis. For example, the following table produces the dist. means in last column. And the last sequence (OPSI_ASTFA) would be the evolutionarilly farest one from the other five sequences.
Pairwise percentage distance table out of CLUSTALW ===>dist. means
-------------------------------------------------------------------
OPSG_CHICK  0.000  0.076  0.177  0.249  0.268  0.929         0.340
OPSB_ANOCA  0.076  0.000  0.161  0.260  0.266  0.921 	     0.337
OPSB_GECGE  0.177  0.161  0.000  0.274  0.282  0.927 	     0.364
OPSG_CARAU  0.249  0.260  0.274  0.000  0.085  0.941 	     0.362
OPSH_CARAU  0.268  0.266  0.282  0.085  0.000  0.932 	     0.367
OPSI_ASTFA  0.929  0.921  0.927  0.941  0.932  0.000 	     0.930
-------------------------------------------------------------------
Gene tree:
A phylogenetic tree for genes or gene products in study. This tree should represent the evolutionary relationship between the genes or gene products and is usually computed based on the molecular data of genes or gene products. In our approach the Neighbour-joining tree computed by CLUSTALW is used.
Species tree:
This phylogenetic tree should represent the "true" evolutionary relationship between the species under consideration. In our approach we use the NCBI's taxonomical database to reconstruct this tree. Certainly, though this taxonomical tree only approximately represents the true species tree, it is even the most complete database about all species which have at least a sequence in the biological databases (EMBL, SWISSPROT, GENBANK).
Reconciled tree:
Using the algorithm developed by R. Page (1994), the gene tree should be reconciled with the species tree. In the reconciled tree, duplication events and extinct (or not-yet-found) genes or gene products are introduced into the phylogenetic tree to explain the incongruence between both trees. Please consider that until now, we've only implemented the reconciling algorithm for binary trees, i.e., selecting more than two genes from the same species would cause the reconciling procedure to fail.













Biocomputing Unit   Bork Group   Search EMBL   EMBL   EBI   e-mail_to_usYan P. Yuan Last modified at 29.Oct. 1997