The UA Campus Repository is experiencing systematic automated, high-volume traffic (bots). Temporary mitigation measures to address bot traffic have been put in place; however, this has resulted in restrictions on searching WITHIN collections or using sidebar filters WITHIN collections. You can still Browse by Title/Author/Year WITHIN collections. Also, you can still search at the top level of the repository (use the search box at the top of every page) and apply filters from that search level. Export of search results has also been restricted at this time. Please contact us at any time for assistance - email repository@u.library.arizona.edu.

Show simple item record

dc.contributor.advisorBarnard, Jacobusen_US
dc.contributor.authorGabbur, Prasad
dc.creatorGabbur, Prasaden_US
dc.date.accessioned2011-12-06T14:09:14Z
dc.date.available2011-12-06T14:09:14Z
dc.date.issued2010en_US
dc.identifier.urihttp://hdl.handle.net/10150/195829
dc.description.abstractMicroarrays emerged in the 1990s as a consequence of the efforts to speed up the process of drug discovery. They revolutionized molecular biological research by enabling monitoring of thousands of genes together. Typical microarray experiments measure the expression levels of a large numberof genes on very few tissue samples. The resulting sparsity of data presents major challenges to statistical methods used to perform any kind of analysis on this data. This research posits that phenotypic classification and prediction serve as good objective functions for both optimization and evaluation of microarray data analysis methods. This is because classification measures whatis needed for diagnostics and provides quantitative performance measures such as leave-one-out (LOO) or held-out prediction accuracy and confidence. Under the classification framework, various microarray data normalization procedures are evaluated using a class label hypothesis testing framework and also employing Support Vector Machines (SVM) and linear discriminant based classifiers. A novel normalization technique based on minimizing the squared correlation coefficients between expression levels of gene pairs is proposed and evaluated along with the other methods. Our results suggest that most normalization methods helped classification on the datasets considered except the rank method, most likely due to its quantization effects.Another contribution of this research is in developing machine learning methods for incorporating an independent source of information, in the form of gene annotations, to analyze microarray data. Recently, genes of many organisms have been annotated with terms from a limited vocabulary called Gene Ontologies (GO), describing the genes' roles in various biological processes, molecular functions and their locations within the cell. Novel probabilistic generative models are proposed for clustering genes using both their expression levels and GO tags. These models are similar in essence to the ones used for multimodal data, such as images and words, with learning and inference done in a Bayesian framework. The multimodal generative models are used for phenotypic class prediction. More specifically, the problems of phenotype prediction for static gene expression data and state prediction for time-course data are emphasized. Using GO tags for organisms whose genes have been studied more comprehensively leads to an improvement in prediction. Our methods also have the potential to provide a way to assess the quality of available GO tags for the genes of various model organisms.
dc.language.isoENen_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.subjectBayesian Inferenceen_US
dc.subjectGene Expressionen_US
dc.subjectGene Ontologyen_US
dc.subjectMachine Learningen_US
dc.subjectMicroarrayen_US
dc.subjectProbabilistic Modelingen_US
dc.titleMachine Learning Methods for Microarray Data Analysisen_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
dc.contributor.chairBarnard, Jacobusen_US
dc.identifier.oclc659754955en_US
thesis.degree.grantorUniversity of Arizonaen_US
thesis.degree.leveldoctoralen_US
dc.contributor.committeememberBarnard, Jacobusen_US
dc.contributor.committeememberRodriguez, Jeffreyen_US
dc.contributor.committeememberHua, Hongen_US
dc.identifier.proquest11008en_US
thesis.degree.disciplineElectrical & Computer Engineeringen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.namePh.D.en_US
refterms.dateFOA2018-08-25T11:35:06Z
html.description.abstractMicroarrays emerged in the 1990s as a consequence of the efforts to speed up the process of drug discovery. They revolutionized molecular biological research by enabling monitoring of thousands of genes together. Typical microarray experiments measure the expression levels of a large numberof genes on very few tissue samples. The resulting sparsity of data presents major challenges to statistical methods used to perform any kind of analysis on this data. This research posits that phenotypic classification and prediction serve as good objective functions for both optimization and evaluation of microarray data analysis methods. This is because classification measures whatis needed for diagnostics and provides quantitative performance measures such as leave-one-out (LOO) or held-out prediction accuracy and confidence. Under the classification framework, various microarray data normalization procedures are evaluated using a class label hypothesis testing framework and also employing Support Vector Machines (SVM) and linear discriminant based classifiers. A novel normalization technique based on minimizing the squared correlation coefficients between expression levels of gene pairs is proposed and evaluated along with the other methods. Our results suggest that most normalization methods helped classification on the datasets considered except the rank method, most likely due to its quantization effects.Another contribution of this research is in developing machine learning methods for incorporating an independent source of information, in the form of gene annotations, to analyze microarray data. Recently, genes of many organisms have been annotated with terms from a limited vocabulary called Gene Ontologies (GO), describing the genes' roles in various biological processes, molecular functions and their locations within the cell. Novel probabilistic generative models are proposed for clustering genes using both their expression levels and GO tags. These models are similar in essence to the ones used for multimodal data, such as images and words, with learning and inference done in a Bayesian framework. The multimodal generative models are used for phenotypic class prediction. More specifically, the problems of phenotype prediction for static gene expression data and state prediction for time-course data are emphasized. Using GO tags for organisms whose genes have been studied more comprehensively leads to an improvement in prediction. Our methods also have the potential to provide a way to assess the quality of available GO tags for the genes of various model organisms.


Files in this item

Thumbnail
Name:
azu_etd_11008_sip1_m.pdf
Size:
2.034Mb
Format:
PDF
Description:
azu_etd_11008_sip1_m.pdf

This item appears in the following Collection(s)

Show simple item record