Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
Publisher
Oxford University Press (OUP)Citation
Spencer Krieger, John Kececioglu, Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization, Bioinformatics, Volume 36, Issue Supplement_1, July 2020, Pages i317–i325, https://doi.org/10.1093/bioinformatics/btaa336Journal
BIOINFORMATICSRights
© The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/).Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Motivation: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. Method: We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. Results: On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q(8) accuracy by more than 2-10%, and Q(3) accuracy by more than 1-3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction.Note
Open access articleISSN
1367-4803EISSN
1460-2059PubMed ID
32657384Version
Final published versionSponsors
National Science Foundationae974a485f413a2113503eed53cd6c53
10.1093/bioinformatics/btaa336
Scopus Count
Collections
Except where otherwise noted, this item's license is described as © The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/).