Predicting protein secondary structure by an ensemble through feature-based accuracy estimation
Affiliation
University of Arizona, Computer ScienceIssue Date
2020-09-21Keywords
ensemble methodsfeature-based accuracy estimation
method hybridization
Protein secondary structure prediction
Metadata
Show full item recordPublisher
ACMCitation
Krieger, S., & Kececioglu, J. (2020, September). Predicting protein secondary structure by an ensemble through feature-based accuracy estimation. In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (pp. 1-10).Rights
© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Protein secondary structure prediction is a fundamental task in computational biology, basic to many bioinformatics workflows, with a diverse collection of tools currently available. An approach from machine learning with the potential to capitalize on such a collection is ensemble prediction, which runs multiple predictors and combines their predictions into one, output by the ensemble. We conduct a thorough study of seven different approaches to ensemble secondary structure prediction, several of which are novel, and show we can indeed obtain an ensemble method that significantly exceeds the accuracy of individual state-of-The-Art tools. The best approaches build on a recent technique known as feature-based accuracy estimation, which estimates the unknown true accuracy of a prediction, here using features of both the prediction output and the internal state of the prediction method. In particular, a hybrid approach to ensemble prediction that leverages accuracy estimation is now the most accurate method currently available: on average over standard CASP and PDB benchmarks, it exceeds the state-of-The-Art Q3 accuracy for 3-state prediction by nearly 4%, and exceeds the Q8 accuracy for 8-state prediction by more than 8%. A preliminary implementation of our approach to ensemble protein secondary structure prediction, in a new tool we call Ssylla, is available free for non-commercial use at ssylla.cs.arizona.edu. © 2020 ACM.ISBN
9781450379649Version
Final accepted manuscriptSponsors
Center for Selective C-H Functionalization, National Science Foundationae974a485f413a2113503eed53cd6c53
10.1145/3388440.3412425
