• Login
    View Item 
    •   Home
    • UA Graduate and Undergraduate Research
    • UA Theses and Dissertations
    • Dissertations
    • View Item
    •   Home
    • UA Graduate and Undergraduate Research
    • UA Theses and Dissertations
    • Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of UA Campus RepositoryCommunitiesTitleAuthorsIssue DateSubmit DateSubjectsPublisherJournalThis CollectionTitleAuthorsIssue DateSubmit DateSubjectsPublisherJournal

    My Account

    LoginRegister

    About

    AboutUA Faculty PublicationsUA DissertationsUA Master's ThesesUA Honors ThesesUA PressUA YearbooksUA CatalogsUA Libraries

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    A Nearest-Neighbor Nonparametric Multiple Imputation Approach for Incomplete Categorical Data under Missing at Random

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    azu_etd_17311_sip1_m.pdf
    Size:
    1.075Mb
    Format:
    PDF
    Description:
    Dissertation not available in ...
    Download
    Author
    Zhou, Muhan
    Issue Date
    2019
    Keywords
    Categorical data
    Double Robustness
    Missing data
    Multiple imputation
    Nearest Neighbor
    Advisor
    Hsu, Chiu-Hsieh
    
    Metadata
    Show full item record
    Publisher
    The University of Arizona.
    Rights
    Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
    Embargo
    Dissertation not available (per author’s request)
    Abstract
    Incomplete categorical data is a common problem in medical research. If researchers simply use complete cases for data analysis, the estimation might be biased and/or inefficient due to ignoring the missing values. Under the assumption of missing at random (MAR), i.e. missing values depend only on the observed data but not on the unobserved data, an increasing number of approaches have been proposed to handle missing data. However, most of the existing missing-data methods for incomplete categorical data are either not robust or sensitive to extreme missingness probabilities. In my dissertation, I study a nearest-neighbor nonparametric multiple imputation approach (NNMI) using two working models to impute values for a missing at random categorical variable, and to estimate marginal mean as well as conditional mean under three different study designs. In the first paper, I adopt the NNMI for dealing with a categorical outcome with missing values and estimating the proportion of each category. Specifically, multinomial logistic regression/cumulative logistic regression is performed to construct a working model for predicting the incomplete categorical outcome. Logistic regression is performed to fit a working model for predicting the missingness probabilities. The predicted values from the two working models are used as scores for calculating distances between each missing value with other non-missing values. A weighting scheme is used to accommodate contributions from two working models when generating predictive scores. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances (donors) from each of the missing observations. I conduct a simulation study to evaluate the performance of the NNMI method and compare it with several alternative methods. A real-data application is presented using a dataset from the 2013 Behavioral Risk Factor Surveillance System (BRFSS) survey. In the second paper, I use the NNMI method to handle missing covariate in logistic regression. Similarly, two working models are used to predict the incomplete covariate and the missingness probabilities. First, I perform a computation to assess the potential factors related to selecting an optimal size of donors. Second, the performance of the proposed method is compared with several alternative methods. Finally, the NNMI is applied on the 2013 BRFSS survey data to impute an incomplete categorical covariate and estimate the regression coefficients from a logistic regression model. In the third paper, the NNMI is extended to handle missing covariate under a matched case-control study. The estimation is conducted using a conditional logistic regression model. The performance of the NNMI is compared with complete cases and six parametric multiple imputation methods. The objective is to assess whether the NNMI demonstrates a doubly robust property compared with parametric methods. Then the NNMI is applied to impute an incomplete categorical covariate under a nested case-control cohort using the 2013 BRFSS survey data. To summarize the three papers, the proposed NNMI is a reasonable approach to dealing with an incomplete categorical outcome with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression model to predict the missing outcome and a logistic regression model to predict the missingness probability. For imputing an incomplete covariate and estimating logistic regression coefficients, the NNMI demonstrates a doubly robust property and works stably when missingness probabilities are close to 0 or 1. When missing values occur in the covariates under a matched case-control design, the NNMI can be used on multiple incomplete covariates as long as the misspecification is moderate.
    Type
    text
    Electronic Dissertation
    Degree Name
    Ph.D.
    Degree Level
    doctoral
    Degree Program
    Graduate College
    Biostatistics
    Degree Grantor
    University of Arizona
    Collections
    Dissertations

    entitlement

     
    The University of Arizona Libraries | 1510 E. University Blvd. | Tucson, AZ 85721-0055
    Tel 520-621-6442 | repository@u.library.arizona.edu
    DSpace software copyright © 2002-2017  DuraSpace
    Quick Guide | Contact Us | Send Feedback
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.