Imputation methods for addressing missing data in short-term monitoring of air pollutants
Name:
imputation_methods_for_address ...
Size:
363.9Kb
Format:
PDF
Description:
Final Accepted Manuscript
Affiliation
Univ Arizona, Mel & Enid Zuckerman Coll Publ HlthUniv Arizona, Interdisciplinary Program Appl Math
Issue Date
2020-08-15
Metadata
Show full item recordPublisher
ELSEVIERCitation
Hadeed, S. J., O'Rourke, M. K., Burgess, J. L., Harris, R. B., & Canales, R. A. (2020). Imputation methods for addressing missing data in short-term monitoring of air pollutants. Science of The Total Environment, 139140. https://doi.org/10.1016/j.scitotenv.2020.139140Journal
SCIENCE OF THE TOTAL ENVIRONMENTRights
Copyright © 2020 Elsevier B.V. All rights reserved.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Monitoring of environmental contaminants is a critical part of exposure sciences research and public health practice. Missing data are often encountered when performing short-term monitoring (<24 h) of air pollutants with real-time monitors, especially in resource-limited areas. Approaches for handling consecutive periods of missing and incomplete data in this context remain unclear. Our aim is to evaluate existing imputation methods for handling missing data for real-time monitors operating for short durations. In a current field-study, realtime PM2.5 monitors were placed outside of 20 households and ran for 24-hours. Missing data was simulated in these households at four consecutive periods of missingness (20%, 40%, 60%, 80%). Univariate (Mean, Median, Last Observation Carried Forward, Kalman Filter, Random, Markov) and multivariate time-series (Predictive Mean Matching, Row Mean Method) methods were used to impute missing concentrations, and performance was evaluated using five error metrics (Absolute Bias, Percent Absolute Error in Means, R2 Coefficient of Determination, Root Mean Square Error, Mean Absolute Error). Univariate methods of Markov, random, and mean imputations were the best performingmethods that yielded 24-hour mean concentrations with the lowest error and highest R2 values across all levels of missingness. When evaluating error metrics minute-by-minute, Kalman filters, median, and Markov methods performed well at low levels of missingness (20-40%). However, at higher levels of missingness (60-80%), Markov, random, median, and mean imputation performed best on average. Multivariate methods were the worst performing imputation methods across all levels of missingness. Imputation using univariate methods may provide a reasonable solution to addressing missing data for short-term monitoring of air pollutants, especially in resource-limited areas. Further efforts are needed to evaluate imputation methods that are generalizable across a diverse range of study environments. (C) 2020 Elsevier B.V. All rights reserved.Note
24 month embargo; published online: 3 May 2020ISSN
0048-9697EISSN
1879-1026PubMed ID
32402974Version
Final accepted manuscriptae974a485f413a2113503eed53cd6c53
10.1016/j.scitotenv.2020.139140
Scopus Count
Collections
Related articles
- Selection of statistical technique for imputation of single site-univariate and multisite-multivariate methods for particulate pollutants time series data with long gaps and high missing percentage.
- Authors: K P, Shakya KS, Kumar P
- Issue date: 2023 Jun
- A novel scaling methodology to reduce the biases associated with missing data from commercial activity monitors.
- Authors: O'Driscoll R, Turicchi J, Duarte C, Michalowska J, Larsen SC, Palmeira AL, Heitmann BL, Horgan GW, Stubbs RJ
- Issue date: 2020
- Spatial imputation for air pollutants data sets via low rank matrix completion algorithm.
- Authors: Liu X, Wang X, Zou L, Xia J, Pang W
- Issue date: 2020 Jun
- Dealing with missing delirium assessments in prospective clinical studies of the critically ill: a simulation study and reanalysis of two delirium studies.
- Authors: Raman R, Chen W, Harhay MO, Thompson JL, Ely EW, Pandharipande PP, Patel MB
- Issue date: 2021 May 6
- The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.
- Authors: Deforth M, Heinze G, Held U
- Issue date: 2024 Dec