Evolutionary rates at codon sites may be used to align sequences and infer protein domain function
AffiliationEvolutionary Medicine Unit, University of the Witwatersrand and National Health Laboratory Service, Johannesburg, South Africa
Plasmodium Molecular Research Unit, Department of Molecular Medicine and Haematology, University of the Witwatersrand and National Health Laboratory Service, Johannesburg, South Africa
Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, USA
School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
MetadataShow full item record
CitationDurand et al. BMC Bioinformatics 2010, 11:151 http://www.biomedcentral.com/1471-2105/11/151
Rights© 2010 Durand et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0
Collection InformationThis item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at email@example.com.
AbstractBACKGROUND:Sequence alignments form part of many investigations in molecular biology, including the determination of phylogenetic relationships, the prediction of protein structure and function, and the measurement of evolutionary rates. However, to obtain meaningful results, a significant degree of sequence similarity is required to ensure that the alignments are accurate and the inferences correct. Limitations arise when sequence similarity is low, which is particularly problematic when working with fast-evolving genes, evolutionary distant taxa, genomes with nucleotide biases, and cases of convergent evolution.RESULTS:A novel approach was conceptualized to address the "low sequence similarity" alignment problem. We developed an alignment algorithm termed FIRE (Functional Inference using the Rates of Evolution), which aligns sequences using the evolutionary rate at codon sites, as measured by the dN/dS ratio, rather than nucleotide or amino acid residues. FIRE was used to test the hypotheses that evolutionary rates can be used to align sequences and that the alignments may be used to infer protein domain function. Using a range of test data, we found that aligning domains based on evolutionary rates was possible even when sequence similarity was very low (for example, antibody variable regions). Furthermore, the alignment has the potential to infer protein domain function, indicating that domains with similar functions are subject to similar evolutionary constraints. These data suggest that an evolutionary rate-based approach to sequence analysis (particularly when combined with structural data) may be used to study cases of convergent evolution or when sequences have very low similarity. However, when aligning homologous gene sets with sequence similarity, FIRE did not perform as well as the best traditional alignment algorithms indicating that the conventional approach of aligning residues as opposed to evolutionary rates remains the method of choice in these cases.CONCLUSIONS:FIRE provides proof of concept that it is possible to align sequences and infer domain function by using evolutionary rates rather than residue similarity. This represents a new approach to sequence analysis with a wide range of potential applications in molecular biology.
VersionFinal published version