We are upgrading the repository! A content freeze is in effect until December 6th, 2024 - no new submissions will be accepted; however, all content already published will remain publicly available. Please reach out to repository@u.library.arizona.edu with your questions, or if you are a UA affiliate who needs to make content available soon. Note that any new user accounts created after September 22, 2024 will need to be recreated by the user in November after our migration is completed.
Protein Identification Algorithms Developed from Statistical Analysis of MS/MS Fragmentation Patterns
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Tandem mass spectrometry is widely used in proteomic studies because of its ability to identify large numbers of peptides from complex mixtures. In a typical LC-MS/MS experiment, thousands of tandem mass spectra will be collected and peptide identification algorithms are of great importance to translate them into peptide sequences. Though these spectra contain both m/z and intensity values, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. In this dissertation, an unsupervised statistical method, K-means clustering, was used to study peptide fragmentation patterns for both CID and ETD data, and many unique fragmentation features were discovered. For instance, strong c(n-1) ions were observed in ETD, indicating that the fragmentation site in ETD is highly related to the amino acid residue location. Based on the fragmentation patterns observed through data mining, a peptide identification algorithm that makes use of these patterns was developed. The program is named SQID and it is the first algorithm in our bioinformatics project. Our testing results using multiple public datasets indicated an improvement in the number of identified peptides compared with popular proteomics algorithms such as Sequest or X!Tandem. SQID was further extended to improve cross-linked peptide identification (SQID-XLink) as well as blind modification identification (SQID-Mod), and both of them showed significant improvement compared with existing methods. In this dissertation the SQID algorithm was also successfully applied to a mosquito proteomics project. We are incorporating new features and new algorithms to our software, such as more fragmentation methods, more accurate spectra prediction and more user-friendly interface. We hope the SQID project can continually benefit researchers and help to improve the data analysis of proteomics community.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeChemistry