Protein Identification Algorithms Developed from Statistical Analysis of MS/MS Fragmentation Patterns
AdvisorWysocki, Vicki H.
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractTandem mass spectrometry is widely used in proteomic studies because of its ability to identify large numbers of peptides from complex mixtures. In a typical LC-MS/MS experiment, thousands of tandem mass spectra will be collected and peptide identification algorithms are of great importance to translate them into peptide sequences. Though these spectra contain both m/z and intensity values, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. In this dissertation, an unsupervised statistical method, K-means clustering, was used to study peptide fragmentation patterns for both CID and ETD data, and many unique fragmentation features were discovered. For instance, strong c(n-1) ions were observed in ETD, indicating that the fragmentation site in ETD is highly related to the amino acid residue location. Based on the fragmentation patterns observed through data mining, a peptide identification algorithm that makes use of these patterns was developed. The program is named SQID and it is the first algorithm in our bioinformatics project. Our testing results using multiple public datasets indicated an improvement in the number of identified peptides compared with popular proteomics algorithms such as Sequest or X!Tandem. SQID was further extended to improve cross-linked peptide identification (SQID-XLink) as well as blind modification identification (SQID-Mod), and both of them showed significant improvement compared with existing methods. In this dissertation the SQID algorithm was also successfully applied to a mosquito proteomics project. We are incorporating new features and new algorithms to our software, such as more fragmentation methods, more accurate spectra prediction and more user-friendly interface. We hope the SQID project can continually benefit researchers and help to improve the data analysis of proteomics community.
Degree ProgramGraduate College