Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses
Affiliation
Department of Ecology and Evolutionary Biology, University of ArizonaIssue Date
2023-07-03Keywords
codon-substitution modelsevolutionary shortcuts
molecular evolution
multinucleotide substitutions
Metadata
Show full item recordPublisher
Oxford University PressCitation
Alexander G Lucaci, Jordan D Zehr, David Enard, Joseph W Thornton, Sergei L Kosakovsky Pond, Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses, Molecular Biology and Evolution, Volume 40, Issue 7, July 2023, msad150, https://doi.org/10.1093/molbev/msad150Journal
Molecular Biology and EvolutionRights
© The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/ licenses/by/4.0/).Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not of direct interest) are presumed absent or are modeled with too crude of a simplification, estimates of key model parameters can become biased, often systematically, and lead to poor statistical performance. Previous work established that failing to accommodate multinucleotide (or multihit, MH) substitutions strongly biases dN/dS-based inference towards false-positive inferences of diversifying episodic selection, as does failing to model variation in the rate of synonymous substitution (SRV) among sites. Here, we develop an integrated analytical framework and software tools to simultaneously incorporate these sources of evolutionary complexity into selection analyses. We found that both MH and SRV are ubiquitous in empirical alignments, and incorporating them has a strong effect on whether or not positive selection is detected (1.4-fold reduction) and on the distributions of inferred evolutionary rates. With simulation studies, we show that this effect is not attributable to reduced statistical power caused by using a more complex model. After a detailed examination of 21 benchmark alignments and a new high-resolution analysis showing which parts of the alignment provide support for positive selection, we show that MH substitutions occurring along shorter branches in the tree explain a significant fraction of discrepant results in selection detection. Our results add to the growing body of literature which examines decades-old modeling assumptions (including MH) and finds them to be problematic for comparative genomic data analysis. Because multinucleotide substitutions have a significant impact on natural selection detection even at the level of an entire gene, we recommend that selection analyses of this type consider their inclusion as a matter of routine. To facilitate this procedure, we developed, implemented, and benchmarked a simple and well-performing model testing selection detection framework able to screen an alignment for positive selection with two biologically important confounding processes: site-to-site synonymous rate variation, and multinucleotide instantaneous substitutions. © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.Note
Open access articleISSN
0737-4038PubMed ID
37395787Version
Final Published Versionae974a485f413a2113503eed53cd6c53
10.1093/molbev/msad150
Scopus Count
Collections
Except where otherwise noted, this item's license is described as © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/ licenses/by/4.0/).
Related articles
- Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates.
- Authors: Dunn KA, Kenney T, Gu H, Bielawski JP
- Issue date: 2019 Jan 14
- Synonymous Site-to-Site Substitution Rate Variation Dramatically Inflates False Positive Rates of Selection Analyses: Ignore at Your Own Peril.
- Authors: Wisotsky SR, Kosakovsky Pond SL, Shank SD, Muse SV
- Issue date: 2020 Aug 1
- Evolutionary models accounting for layers of selection in protein-coding genes and their impact on the inference of positive selection.
- Authors: Rubinstein ND, Doron-Faigenboim A, Mayrose I, Pupko T
- Issue date: 2011 Dec
- Large-scale analyses of synonymous substitution rates can be sensitive to assumptions about the process of mutation.
- Authors: Aris-Brosou S, Bielawski JP
- Issue date: 2006 Aug 15
- Codon Usage Selection Can Bias Estimation of the Fraction of Adaptive Amino Acid Fixations.
- Authors: Matsumoto T, John A, Baeza-Centurion P, Li B, Akashi H
- Issue date: 2016 Jun