PhytoOracle: Scalable, modular phenomics data processing pipelines
Author
Gonzalez, E.M.Zarei, A.
Hendler, N.
Simmons, T.
Zarei, A.
Demieville, J.
Strand, R.
Rozzi, B.
Calleja, S.
Ellingson, H.
Cosi, M.
Davey, S.
Lavelle, D.O.
Truco, M.J.
Swetnam, T.L.
Merchant, N.
Michelmore, R.W.
Lyons, E.
Pauli, D.
Affiliation
School of Plant Sciences, University of ArizonaDepartment of Computer Science, University of Arizona
Data Science Institute, University of Arizona, Tucson
BIO5 Institute, University of Arizona
Department of Cellular and Molecular Medicine, University of Arizona
School of Natural Resources and the Environment, University of Arizona
Issue Date
2023-03-05Keywords
data managementdistributed computing
high performance computing
image analysis
morphological phenotyping
phenomics
physiological phenotyping
point cloud analysis
Metadata
Show full item recordPublisher
Frontiers Media S.A.Citation
Gonzalez EM, Zarei A, Hendler N, Simmons T, Zarei A, Demieville J, Strand R, Rozzi B, Calleja S, Ellingson H, Cosi M, Davey S, Lavelle DO, Truco MJ, Swetnam TL, Merchant N, Michelmore RW, Lyons E and Pauli D (2023) PhytoOracle: Scalable, modular phenomics data processing pipelines. Front. Plant Sci. 14:1112973. doi: 10.3389/fpls.2023.1112973Journal
Frontiers in Plant ScienceRights
© 2023 Gonzalez, Zarei, Hendler, Simmons, Zarei, Demieville, Strand, Rozzi, Calleja, Ellingson, Cosi, Davey, Lavelle, Truco, Swetnam, Merchant, Michelmore, Lyons and Pauli. This is an open-access article distributed under the terms of the Creative Commons Attribution License.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
As phenomics data volume and dimensionality increase due to advancements in sensor technology, there is an urgent need to develop and implement scalable data processing pipelines. Current phenomics data processing pipelines lack modularity, extensibility, and processing distribution across sensor modalities and phenotyping platforms. To address these challenges, we developed PhytoOracle (PO), a suite of modular, scalable pipelines for processing large volumes of field phenomics RGB, thermal, PSII chlorophyll fluorescence 2D images, and 3D point clouds. PhytoOracle aims to (i) improve data processing efficiency; (ii) provide an extensible, reproducible computing framework; and (iii) enable data fusion of multi-modal phenomics data. PhytoOracle integrates open-source distributed computing frameworks for parallel processing on high-performance computing, cloud, and local computing environments. Each pipeline component is available as a standalone container, providing transferability, extensibility, and reproducibility. The PO pipeline extracts and associates individual plant traits across sensor modalities and collection time points, representing a unique multi-system approach to addressing the genotype-phenotype gap. To date, PO supports lettuce and sorghum phenotypic trait extraction, with a goal of widening the range of supported species in the future. At the maximum number of cores tested in this study (1,024 cores), PO processing times were: 235 minutes for 9,270 RGB images (140.7 GB), 235 minutes for 9,270 thermal images (5.4 GB), and 13 minutes for 39,678 PSII images (86.2 GB). These processing times represent end-to-end processing, from raw data to fully processed numerical phenotypic trait data. Repeatability values of 0.39-0.95 (bounding area), 0.81-0.95 (axis-aligned bounding volume), 0.79-0.94 (oriented bounding volume), 0.83-0.95 (plant height), and 0.81-0.95 (number of points) were observed in Field Scanalyzer data. We also show the ability of PO to process drone data with a repeatability of 0.55-0.95 (bounding area). Copyright © 2023 Gonzalez, Zarei, Hendler, Simmons, Zarei, Demieville, Strand, Rozzi, Calleja, Ellingson, Cosi, Davey, Lavelle, Truco, Swetnam, Merchant, Michelmore, Lyons and Pauli.Note
Open access journalISSN
1664-462XVersion
Final Published Versionae974a485f413a2113503eed53cd6c53
10.3389/fpls.2023.1112973
Scopus Count
Collections
Except where otherwise noted, this item's license is described as © 2023 Gonzalez, Zarei, Hendler, Simmons, Zarei, Demieville, Strand, Rozzi, Calleja, Ellingson, Cosi, Davey, Lavelle, Truco, Swetnam, Merchant, Michelmore, Lyons and Pauli. This is an open-access article distributed under the terms of the Creative Commons Attribution License.