Affiliation
Computational Biology Department, Carnegie Mellon UniversityDepartment of Computer Science, The University of Arizona
Issue Date
2017Keywords
Multiple sequence alignmentalignment scoring functions
parameter values
accuracy estimation
parameter advising
Metadata
Show full item recordPublisher
IEEE COMPUTER SOCCitation
IEEE/ACM Transactions on Computational Biology and Bioinformatics 14:5, 1028-1041, 2017Rights
© 2015 IEEE.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
While the multiple sequence alignment output by an aligner strongly depends on the parameter values used for the alignment scoring function (such as the choice of gap penalties and substitution scores), most users rely on the single default parameter setting provided by the aligner. A different parameter setting, however, might yield a much higher-quality alignment for the specific set of input sequences. The problem of picking a good choice of parameter values for specific input sequences is called parameter advising. A parameter advisor has two ingredients: (i) a set of parameter choices to select from, and (ii) an estimator that provides an estimate of the accuracy of the alignment computed by the aligner using a parameter choice. The parameter advisor picks the parameter choice from the set whose resulting alignment has highest estimated accuracy. We consider for the first time the problem of learning the optimal set of parameter choices for a parameter advisor that uses a given accuracy estimator. The optimal set is one that maximizes the expected true accuracy of the resulting parameter advisor, averaged over a collection of training data. While we prove that learning an optimal set for an advisor is NP-complete, we show there is a natural approximation algorithm for this problem, and prove a tight bound on its approximation ratio. Experiments with an implementation of this approximation algorithm on biological benchmarks, using various accuracy estimators from the literature, show it finds sets for advisors that are surprisingly close to optimal. Furthermore, the resulting parameter advisors are significantly more accurate in practice than simply aligning with a single default parameter choice.ISSN
1545-5963EISSN
1557-9964PubMed ID
28991725Version
Final accepted manuscriptSponsors
US National Science Foundation [IIS-1217886]; University of Arizona IGERT in Comparative Genomics through US National Science Foundation [DGE-0654435]ae974a485f413a2113503eed53cd6c53
10.1109/TCBB.2015.2430323
Scopus Count
Collections
Related articles
- Accuracy estimation and parameter advising for protein multiple sequence alignment.
- Authors: Kececioglu J, DeBlasio D
- Issue date: 2013 Apr
- Learning scoring schemes for sequence alignment from partial examples.
- Authors: Kim E, Kececioglu J
- Issue date: 2008 Oct-Dec
- Adaptive Local Realignment of Protein Sequences.
- Authors: DeBlasio D, Kececioglu J
- Issue date: 2018 Jul
- Reducing Alignment Time Complexity of Ultra-Large Sets of Sequences.
- Authors: Rubio-Largo Á, Vanneschi L, Castelli M, Vega-Rodríguez MA
- Issue date: 2017 Nov
- An improved scoring method for protein residue conservation and multiple sequence alignment.
- Authors: Nguyen KD, Pan Y
- Issue date: 2011 Dec