Name:
art_3A10.1186_2Fs13015-017-010 ...
Size:
2.676Mb
Format:
PDF
Description:
Final Published Version
Affiliation
Univ Arizona, Dept Comp SciIssue Date
2017-04-19Keywords
Multiple sequence alignmentCore blocks
Alignment accuracy
Accuracy estimation
Parameter advising
Machine learning
Regression
Metadata
Show full item recordPublisher
BIOMED CENTRAL LTDCitation
Core column prediction for protein multiple sequence alignments 2017, 12 (1) Algorithms for Molecular BiologyJournal
Algorithms for Molecular BiologyRights
© The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Background: In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. When computing an alignment in practice, however, a reference alignment is not known, so the coreness of its columns can only be predicted. Results: We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment's accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner's scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy.Note
Open Access Journal.ISSN
1748-7188PubMed ID
28435440Version
Final published versionSponsors
University of Arizona by US National Science Foundation [IIS-1217886]; Carnegie Mellon University by NSF [CCF-1256087]; NSF [CCF-131999]; NIH [R01HG007104]; Gordon and Betty Moore Foundation [GBMF4554]; University of Arizona Open Access Publishing FundAdditional Links
http://almob.biomedcentral.com/articles/10.1186/s13015-017-0102-3ae974a485f413a2113503eed53cd6c53
10.1186/s13015-017-0102-3
Scopus Count
Collections
Except where otherwise noted, this item's license is described as © The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License.
Related articles
- Accuracy estimation and parameter advising for protein multiple sequence alignment.
- Authors: Kececioglu J, DeBlasio D
- Issue date: 2013 Apr
- Scoring profile-to-profile sequence alignments.
- Authors: Wang G, Dunbrack RL Jr
- Issue date: 2004 Jun
- Learning Parameter-Advising Sets for Multiple Sequence Alignment.
- Authors: DeBlasio D, Kececioglu J
- Issue date: 2017 Sep-Oct
- OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.
- Authors: Raghava GP, Searle SM, Audley PC, Barber JD, Barton GJ
- Issue date: 2003 Oct 10
- Using CLUSTAL for multiple sequence alignments.
- Authors: Higgins DG, Thompson JD, Gibson TJ
- Issue date: 1996