Show simple item record

dc.contributor.advisorKececioglu, Johnen
dc.contributor.authorPORFIRIO, DAVID JONATHAN
dc.creatorPORFIRIO, DAVID JONATHANen
dc.date.accessioned2016-06-16T18:56:56Z
dc.date.available2016-06-16T18:56:56Z
dc.date.issued2016
dc.identifier.citationPORFIRIO, DAVID JONATHAN. (2016). SINGLE-SEQUENCE PROTEIN SECONDARY STRUCTURE PREDICTION BY NEAREST-NEIGHBOR CLASSIFICATION OF PROTEIN WORDS (Bachelor's thesis, University of Arizona, Tucson, USA).
dc.identifier.urihttp://hdl.handle.net/10150/613449
dc.description.abstractPredicting protein secondary structure is the process by which, given a sequence of amino acids as input, the secondary structure class of each position in the sequence is predicted. Our approach is built on the extraction of protein words of a fixed length from protein sequences, followed by nearest-neighbor classification in order to predict the secondary structure class of the middle position in each word. We present a new formulation for learning a distance function on protein words based on position-dependent substitution scores on amino acids. These substitution scores are learned by solving a large linear programming problem on examples of words with known secondary structures. We evaluated this approach by using a database of 5519 proteins with a total amino acid length of approximately 3000000. Presently, a test scheme using words of length 23 achieved a uniform average over word position of 65.2%. The average accuracy for alpha-classified words in the test was 63.1%, for beta-classified words was 56.6%, and for coil classified words was 71.6%.
dc.language.isoen_USen
dc.publisherThe University of Arizona.en
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.titleSINGLE-SEQUENCE PROTEIN SECONDARY STRUCTURE PREDICTION BY NEAREST-NEIGHBOR CLASSIFICATION OF PROTEIN WORDSen_US
dc.typetexten
dc.typeElectronic Thesisen
thesis.degree.grantorUniversity of Arizonaen
thesis.degree.levelBachelorsen
thesis.degree.disciplineHonors Collegeen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameB.S.en
refterms.dateFOA2018-04-26T04:06:07Z
html.description.abstractPredicting protein secondary structure is the process by which, given a sequence of amino acids as input, the secondary structure class of each position in the sequence is predicted. Our approach is built on the extraction of protein words of a fixed length from protein sequences, followed by nearest-neighbor classification in order to predict the secondary structure class of the middle position in each word. We present a new formulation for learning a distance function on protein words based on position-dependent substitution scores on amino acids. These substitution scores are learned by solving a large linear programming problem on examples of words with known secondary structures. We evaluated this approach by using a database of 5519 proteins with a total amino acid length of approximately 3000000. Presently, a test scheme using words of length 23 achieved a uniform average over word position of 65.2%. The average accuracy for alpha-classified words in the test was 63.1%, for beta-classified words was 56.6%, and for coil classified words was 71.6%.


Files in this item

Thumbnail
Name:
azu_etd_mr_2016_0172_sip1_m.pdf
Size:
559.2Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record