Show simple item record

dc.contributor.authorHao, Ning
dc.contributor.authorZhang, Hao Helen
dc.date.accessioned2018-11-07T19:59:44Z
dc.date.available2018-11-07T19:59:44Z
dc.date.issued2017
dc.identifier.citationHao, Ning; Zhang, Hao Helen. Oracle P-values and variable screening. Electron. J. Statist. 11 (2017), no. 2, 3251--3271. doi:10.1214/17-EJS1284. https://projecteuclid.org/euclid.ejs/1506931546en_US
dc.identifier.issn1935-7524
dc.identifier.doi10.1214/17-EJS1284
dc.identifier.urihttp://hdl.handle.net/10150/630584
dc.description.abstractThe concept of P-value was proposed by Fisher to measure inconsistency of data with a specified null hypothesis, and it plays a central role in statistical inference. For classical linear regression analysis, it is a standard procedure to calculate P-values for regression coefficients based on least squares estimator (LSE) to determine their significance. However, for high dimensional data when the number of predictors exceeds the sample size, ordinary least squares are no longer proper and there is not a valid definition for P-values based on LSE. It is also challenging to define sensible P-values for other high dimensional regression methods such as penalization and resampling methods. In this paper, we introduce a new concept called oracle P-value to generalize traditional P-values based on LSE to high dimensional sparse regression models. Then we propose several estimation procedures to approximate oracle P-values for real data analysis. We show that the oracle P-value framework is useful for developing new and powerful tools to enhance high dimensional data analysis, including variable ranking, variable selection, and screening procedures with false discovery rate (FDR) control. Numerical examples are then presented to demonstrate performance of the proposed methods.en_US
dc.description.sponsorshipNational Science Foundations [DBI-1261830, DMS-1309507, DMS-1418172, NSFC-11571009]en_US
dc.language.isoenen_US
dc.publisherINST MATHEMATICAL STATISTICSen_US
dc.relation.urlhttps://projecteuclid.org/euclid.ejs/1506931546en_US
dc.rightsCreative Commons Attribution 4.0 International License. Copyright is held by the author(s) or the publisher. If your intended use exceeds the permitted uses specified by the license, contact the publisher for more information.en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectFalse discovery rateen_US
dc.subjecthigh dimensional dataen_US
dc.subjectinferenceen_US
dc.subjectP-valueen_US
dc.subjectvariable selectionen_US
dc.titleOracle P-values and variable screeningen_US
dc.typeArticleen_US
dc.contributor.departmentUniv Arizona, Dept Mathen_US
dc.identifier.journalELECTRONIC JOURNAL OF STATISTICSen_US
dc.description.noteOpen Access Journal.en_US
dc.description.collectioninformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.en_US
dc.eprint.versionFinal published versionen_US
dc.source.journaltitleElectronic Journal of Statistics
dc.source.volume11
dc.source.issue2
dc.source.beginpage3251
dc.source.endpage3271
refterms.dateFOA2018-11-07T19:59:45Z


Files in this item

Thumbnail
Name:
euclid.ejs.1506931546.pdf
Size:
625.7Kb
Format:
PDF
Description:
Final Published version

This item appears in the following Collection(s)

Show simple item record

Creative Commons Attribution 4.0 International License. Copyright is held by the author(s) or the publisher. If your intended use exceeds the permitted uses specified by the license, contact the publisher for more information.
Except where otherwise noted, this item's license is described as Creative Commons Attribution 4.0 International License. Copyright is held by the author(s) or the publisher. If your intended use exceeds the permitted uses specified by the license, contact the publisher for more information.