• Generalized Performance Measures for Evaluation of Object Detection

      Rodriguez, Jeffrey J.; Philip, Rohit Chacko; Bilgin, Ali; Tharp, Hal S. (The University of Arizona., 2020)
      Classical detection theory has long used traditional measures such as precision, recall, F measure, and G measure to evaluate the quality of detection results. Such evaluation can be done for performance analysis of competing detection algorithms, or for parameter tuning to optimize parameters based on training data. This performance analysis can be done at the pixel level or at the object level. Conventional performance measures to quantify detection accuracy against known ground truth are effective when applied at the pixel level or when applied at the object level with simple detection outcomes. In many cases, however, object-level detection often results in hybrid detections such as a single ground truth object split into multiple detected objects (i.e., split detections) or multiple ground truth objects merged into a single detected object (i.e., merged truths) and combinations thereof. In such cases, conventional performance measures are ineffective. A new generalized framework for evaluating object detection algorithms is proposed. This generalized framework introduces two new precision measures and two new recall measures, resulting in four new F and G measures, all of which reduce to their classical detection theory counterparts in cases with simple detection outcomes (no split detections or merged truths). The new concept of shared positives is developed and the shared positive (SP) curve is proposed as a performance evaluation tool. The performance analysis of the new generalized framework includes evaluation of eight object detection algorithms. Two of these new generalized F measures perform better than the classical F measure during parameter tuning and algorithm performance comparison, while a third offers comparable performance. One of these new generalized F measures can serve as a replacement for classical F measure to perform object detection evaluation more accurately. In addition, we also develop three high-throughput zebrafish ototoxicity assays: a) an anatomical assay which uses feature extraction techniques and machine learning to automatically quantify damage to zebrafish neuromasts; b) a behavioral assay which uses object detection and orientation computation to automatically measure rheotaxis index; and c) a novel tracking assay which uses optical flow to track zebrafish in video data and provide a comprehensive analysis of zebrafish swimming behavior. The behavioral assay serves as a test case to demonstrate the effectiveness of the new generalized F measures for zebrafish detection algorithm tuning.