Precision Weed and Crop Classification in the Early Growing Season using Color, Texture, and Shape
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Operational precision weed treatment in crops requires accurate classification of plants as eitherdesired or undesired. This dissertation presents findings on how to accurately and efficiently distinguish weeds from crops using shape, color, and texture all of which can be extracted from RGB images. Nine different binary classification approaches were examined: Decision Tree, Extra Trees, Gradient Boosting, K Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Logistic Regression, Multi-Layer Perceptron (MLP), Random Forest, and Support Vector Machine (SVM). Additionally, use of a unary classification approach was considered. Two sets of crop images were used in this work: broccoli and cantaloupe. An Area Under the Curve (AUC), a measure of accuracy, of 0.92 was obtained with a Random Forest classification using combination of shape (roundness, convexity), color (saturation mean), and texture (hog mean). While the concept of the shape of an object is familiar to many, the texture attributes are not, especially when expressing a 3 dimensional concept from a 2 dimensional image. The textural attributes are computed for the objects in the image in non-RGB color spaces. Specifically, the texture attributes are drawn from channels in the Hue, Saturation, and Intensity (HSI), YcbCr, a color space defined by the International Commission on Illumination (CIE) (CIELab) color space, as well as the more typical grayscale conversion. Using only texture attributes resulted in an AUC of 0.90 (Random Forest), indicating that texture alone can be used in prediction with reasonably good performance. Using only color produced a model with an AUC of 0.87 (Random Forest), and only shape in classification resulted in an AUC of 0.86 (Boosted Gradient). These values were derived from images of the vegetation acquired over several weeks early in the development cycle. Higher AUC values are obtained when using images from a single day: an AUC of 0.97 was obtained using Extra Trees, MLP, and Logistic Regression classification on images acquired four weeks into the broccoli development cycle. Some of the increased performance seen can be explained by the factors exhibiting the most changes across the development cycle, as a single day is a snapshot of features such as color or shape that may show a trend over the season. A Dickey-Fuller test (α = 0.05), a statistical test used to determine if a time series is stationary by testing for the presence of a unit root, shows that the hue of vegetation is stable over the development cycle, but some shape attributes—such as the radial variance— are not. Additionally, two other challenges are evaluated and addressed: image segmentation, and class imbalance. While these are not central to this study, they are nevertheless key to the work, as they impact performance and accuracy. For image segmentation, a set of 11 commonly used indices (Excessive Red, Excessive Green, Excessive Green − Excessive Red, Color Index of Vegetation, Mixed Excess Green, Vegetative, Normalized Green-Red Difference, Triangular Greenness, Normalized Difference, Combined Indices 1, and Combined Indices 2) and five newly proposed indices defined in this document (based on the HSI, HSV, YCbCr, YIQ, and CIELab color spaces) were also evaluated. The HSI based index developed for this study showed the lowest error count. Class imbalance correction using oversampling algorithms Synthetic Minority Oversampling Technique (SMOTE), borderline, K-Means, Adaptive Synthetic (ADASYN), SVM as well as combined oversampling/undersampling algorithms of SMOTE+Edited Nearest Neighbor (ENN) and SMOTE+TOMEK were also evaluated. For oversampling approaches Decision Tree benefited the most, Random Forest the least. For the combined approach, Decision Tree once again benefited the most, however this technique led to much lower AUC scores from Logistic Regression (SMOTE+ENN) and Random Forest (SMOTE+TOMEK). The results showed that calculating texture features in non-RGB color spaces, such as Hue, Saturation, and Value (HSV), YIQ, and CIELab, yields superior classification accuracy compared to traditional grayscale-based approaches. By leveraging color-separated texture analysis, particularly using Grey-Level Co-occurrence Matrix (GLCM) and related descriptors, the study highlights improved differentiation xii between crops and weeds in early developmental stages. Results indicate that texture patterns extracted from hue and chromaticity channels provide richer discriminative information than grayscale intensity alone, leading to higher precision and recall for most classification models. This was not always the case, as some classification approaches experienced declines in precision (MLP showed a Δ of -11.39%), and in recall (LDA showed a Δ of -9.21%). Generally, our findings suggest that integrating texture analysis from alternative color spaces enhances weed/crop classification performance, offering a more robust solution for early-season weed treatment automation. This work explores a weed/crop classification framework tailored for precision agriculture applications. The ultimate goal is to develop a system that can be optimized and seamlessly integrated into a ground based treatment process, enabling more efficient weed management. Drawing on extensive experimental results, this paper presents findings that support the feasibility of achieving that integrated approach, but the real-time throughput to achieve a given result must also be considered. xiiiType
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeBiosystems Analytics & Technology.
