Show simple item record

dc.contributor.authorMount, David
dc.contributor.authorPutnam, Charles
dc.contributor.authorCentouri, Sara
dc.contributor.authorManziello, Ann
dc.contributor.authorPandey, Ritu
dc.contributor.authorGarland, Linda
dc.contributor.authorMartinez, Jesse
dc.date.accessioned2016-05-20T08:57:11Z
dc.date.available2016-05-20T08:57:11Z
dc.date.issued2014en
dc.identifier.citationMount et al. BMC Medical Genomics 2014, 7:33 http://www.biomedcentral.com/1755-8794/7/33en
dc.identifier.doi10.1186/1755-8794-7-33en
dc.identifier.urihttp://hdl.handle.net/10150/610040
dc.description.abstractBACKGROUND:Numerous microarray-based prognostic gene expression signatures of primary neoplasms have been published but often with little concurrence between studies, thus limiting their clinical utility. We describe a methodology using logistic regression, which circumvents limitations of conventional Kaplan Meier analysis. We applied this approach to a thrice-analyzed and published squamous cell carcinoma (SQCC) of the lung data set, with the objective of identifying gene expressions predictive of early death versus long survival in early-stage disease. A similar analysis was applied to a data set of triple negative breast carcinoma cases, which present similar clinical challenges.METHODS:Important to our approach is the selection of homogenous patient groups for comparison. In the lung study, we selected two groups (including only stages I and II), equal in size, of earliest deaths and longest survivors. Genes varying at least four-fold were tested by logistic regression for accuracy of prediction (area under a ROC plot). The gene list was refined by applying two sliding-window analyses and by validations using a leave-one-out approach and model building with validation subsets. In the breast study, a similar logistic regression analysis was used after selecting appropriate cases for comparison.RESULTS:A total of 8594 variable genes were tested for accuracy in predicting earliest deaths versus longest survivors in SQCC. After applying the two sliding window and the leave-one-out analyses, 24 prognostic genes were identified
dc.description.abstractmost of them were B-cell related. When the same data set of stage I and II cases was analyzed using a conventional Kaplan Meier (KM) approach, we identified fewer immune-related genes among the most statistically significant hits
dc.description.abstractwhen stage III cases were included, most of the prognostic genes were missed. Interestingly, logistic regression analysis of the breast cancer data set identified many immune-related genes predictive of clinical outcome.CONCLUSIONS:Stratification of cases based on clinical data, careful selection of two groups for comparison, and the application of logistic regression analysis substantially improved predictive accuracy in comparison to conventional KM approaches. B cell-related genes dominated the list of prognostic genes in early stage SQCC of the lung and triple negative breast cancer.
dc.language.isoenen
dc.publisherBioMed Centralen
dc.relation.urlhttp://www.biomedcentral.com/1755-8794/7/33en
dc.rights© 2014 Mount et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0).en
dc.rights.urihttps://creativecommons.org/licenses/by/2.0/
dc.titleUsing logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinomaen
dc.typeArticleen
dc.identifier.eissn1755-8794en
dc.contributor.departmentBioinformatics Shared Service, Arizona Health Sciences Center, The University of Arizona, Tucson, Arizona 85735, USAen
dc.contributor.departmentDepartment of Surgery, Arizona Health Sciences Center, The University of Arizona, Tucson, Arizona 85735, USAen
dc.contributor.departmentArizona Comprehensive Cancer Center, The University of Arizona, Tucson, Arizona 85735, USAen
dc.contributor.departmentDepartment of Medicine, Arizona Health Sciences Center, The University of Arizona, Tucson, Arizona 85735, USAen
dc.contributor.departmentDepartment of Cellular and Molecular Medicine, Arizona Health Sciences Center, The University of Arizona, Tucson, Arizona 85735, USAen
dc.identifier.journalBMC Medical Genomicsen
dc.description.collectioninformationThis item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at repository@u.library.arizona.edu.en
dc.eprint.versionFinal published versionen
refterms.dateFOA2018-09-11T10:42:53Z
html.description.abstractBACKGROUND:Numerous microarray-based prognostic gene expression signatures of primary neoplasms have been published but often with little concurrence between studies, thus limiting their clinical utility. We describe a methodology using logistic regression, which circumvents limitations of conventional Kaplan Meier analysis. We applied this approach to a thrice-analyzed and published squamous cell carcinoma (SQCC) of the lung data set, with the objective of identifying gene expressions predictive of early death versus long survival in early-stage disease. A similar analysis was applied to a data set of triple negative breast carcinoma cases, which present similar clinical challenges.METHODS:Important to our approach is the selection of homogenous patient groups for comparison. In the lung study, we selected two groups (including only stages I and II), equal in size, of earliest deaths and longest survivors. Genes varying at least four-fold were tested by logistic regression for accuracy of prediction (area under a ROC plot). The gene list was refined by applying two sliding-window analyses and by validations using a leave-one-out approach and model building with validation subsets. In the breast study, a similar logistic regression analysis was used after selecting appropriate cases for comparison.RESULTS:A total of 8594 variable genes were tested for accuracy in predicting earliest deaths versus longest survivors in SQCC. After applying the two sliding window and the leave-one-out analyses, 24 prognostic genes were identified
html.description.abstractmost of them were B-cell related. When the same data set of stage I and II cases was analyzed using a conventional Kaplan Meier (KM) approach, we identified fewer immune-related genes among the most statistically significant hits
html.description.abstractwhen stage III cases were included, most of the prognostic genes were missed. Interestingly, logistic regression analysis of the breast cancer data set identified many immune-related genes predictive of clinical outcome.CONCLUSIONS:Stratification of cases based on clinical data, careful selection of two groups for comparison, and the application of logistic regression analysis substantially improved predictive accuracy in comparison to conventional KM approaches. B cell-related genes dominated the list of prognostic genes in early stage SQCC of the lung and triple negative breast cancer.


Files in this item

Thumbnail
Name:
1755-8794-7-33.pdf
Size:
1.138Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

© 2014 Mount et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0).
Except where otherwise noted, this item's license is described as © 2014 Mount et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0).