Interface Between Plant Science and Data Science: Extracting Insights From Phenomics Data
Author
Gonzalez, Emmanuel MiguelIssue Date
2024Keywords
computationdata science
distributed computing
high-throughput phenotyping
machine learning
phenomics
Advisor
Pauli, William D.
Metadata
Show full item recordPublisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Phenomics is the study of the observable traits or phenotypes of organisms in relation to their genetic and environmental factors. Phenomics is essential for understanding how plants respond to various stresses, such as drought, heat, salinity, and pests, and for identifying the genes and molecular mechanisms involved in stress tolerance. However, phenotyping large numbers of plants in the field is challenging due to the complexity and variability of the phenotypes and the environmental conditions. Therefore, there is a need for developing efficient and accurate methods for collecting, processing, and analyzing high-throughput phenomics data. This dissertation presents three projects that address challenges across phenomics data types: (i) processing pipelines for large-scale, multimodal field phenomics data of lettuce and sorghum, (ii) development of machine learning models for biotic stress detection and quantification in images of field-grown sorghum plants, and (iii) gene expression prediction in cotton using leaf-level hyperspectral reflectance data and RNA-seq. The first project develops processing pipelines for extracting phenotypic traits from field phenomics data of lettuce and sorghum. The data were collected using a ground-based platform equipped with multiple sensors, such as red-green-blue (RGB) and thermal two-dimensional (2D) cameras, PSII chlorophyll fluorescence 2D imagers, and three-dimensional (3D) laser scanners. The pipelines involve several stages, including plant detection, segmentation, and the extraction of various traits. These traits include height, volume, bounding area, canopy temperature, maximum potential quantum efficiency of Photosystem II (Fv/Fm), and shape features derived from topological data analysis (TDA). The findings demonstrate the robustness and accuracy of the pipelines across various phenotyping platforms, including the UArizona Field Scanalyzer and drones. These pipelines can be effectively utilized to investigate phenotypic variation and stress responses in lettuce and sorghum. The second project applies neural networks (NNs) to detect and quantify charcoal rot of sorghum (CRS), a fungal disease caused by the fungus Macrophomina phaseolina (Tassi) Goid. The project involved the development and comparison of two types of neural networks: classification and segmentation. These networks were trained and tested to detect the presence of CRS in sorghum plants. Among the various models tested, EfficientNet-B3 emerged as the top-performing model for image classification tasks, while a fully convolutional network (FCN) excelled in image segmentation tasks. One of the key achievements of these models was their ability to differentiate between abiotic stress, such as drought stress, and biotic stress, in this case, CRS. This distinction is crucial as it allows for more accurate diagnosis and treatment of the disease. The project also investigated the impact of different patch sizes on the performance and accuracy of the models. It was found that larger image patches led to improvements in both efficiency and accuracy. This suggests that the choice of patch size is a significant factor in the successful application of neural networks in disease detection and quantification. The third project investigates the feasibility of predicting gene expression in cotton using hyperspectral reflectance and RNA-seq data. Hyperspectral reflectance data capture the spectral signatures of plants across a wide range of wavelengths, which can reflect the biochemical and physiological status of the plants. RNA-seq is a technique that measures the expression levels of thousands of genes simultaneously. The project aims to leverage hyperspectral reflectance data to predict gene expression data using partial least squares regression (PLSR). The results show that hyperspectral reflectance data can accurately predict certain genes, particularly those associated with photosynthesis, stress response, secondary metabolite synthesis, and plant-pathogen interactions. The project demonstrates the potential of using hyperspectral reflectance data as a proxy for gene expression in cotton.Type
Electronic Dissertationtext
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegePlant Science