Combined / Layered Normalization Effectively Removes Systematic Errors in Small Untargeted Lipidomics Studies
Author
Wang, QiumingIssue Date
2023Keywords
high-dimensional dataKullback–Leibler divergence
LC/MS
normalization
systematic errors removal
untargeted lipidomics
Advisor
Hallmark, BrianSnider, Justin
Metadata
Show full item recordPublisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Untargeted lipidomics is a powerful approach for studying the lipidomes of biological samples and determining how lipid profiles change under different conditions. Liquid chromatography-mass spectrometry (LC-MS) is an essential analytical tool that has advanced over the past ten years to drive the development of untargeted lipidomics. However, systematic errors, such as batch effects, temporal drift, and lipid concentration variation, pose a significant challenge in LC-MS-based lipidomics analysis and data normalization to account for these errors is an essential data processing step before downstream statistical analysis and visualizations. While a number of different normalization methods have been presented in the literature, more recent developments have focused on computational approaches for large clinical and epidemiological studies. Most of these normalization methods haven't been evaluated or compared in studies with small sample sizes. To examine this, we designed a small (n=50 samples) experiment with biological and technical duplicates (pooled quality control samples) to mimic the systematic errors that typically occur in untargeted lipidomics studies. Internal standards, principal component analysis (PCA), and Kullback-Leibler (KL) divergence-based scores were used to evaluate normalization performance. We found that Total Intensity Normalization (TI) and the Probabilistic Quotient Normalization (PQN) successfully removed drift and concentration variation, while Bridge Sample Normalization (BRDG) and Median Run Normalization (MED) eliminated the batch effect. We developed the combined and layered normalization strategies based on the function of the systematic error removal for TI, PQN, BRDG, and MED. All combined and layered normalizations effectively improved the overall performance of error removal. In particular, MED-combined/layered normalizations achieved optimal performance with the minimal unique biological information change among all investigated methods. PQN&MED is a robust normalization strategy for small sample size lipidomics data with batch effect, drift, and lipid concentration variation. In conclusion, it is necessary to detect specific systematic errors in the data before determining which normalization methods should be applied. Although decreasing data variance due to systematic errors is critical, it is also important to quantify the biological information alternation after normalization. Thus, running multiple biological and technical duplicates is recommended in small studies with multiple batches to evaluate the selected normalization methods. A validated normalization approach from that procedure can significantly reduce errors and enhance the reproducibility in limited-size untargeted lipidomics studies.Type
Electronic Thesistext
Degree Name
M.S.Degree Level
mastersDegree Program
Graduate CollegeNutritional Sciences