An explainable and efficient deep learning framework for video anomaly detection
Name:
Explain_VAD_v53_cam.pdf
Size:
1.293Mb
Format:
PDF
Description:
Final Accepted Manuscript
Affiliation
NSF Center for Cloud and Autonomic Computing, The University of ArizonaIssue Date
2021-11-23Keywords
Abnormal event detectionAnomaly video analysis
Context mining
Deep features
Interpretability
Security
Video surveillance
Metadata
Show full item recordPublisher
Springer Science and Business Media LLCCitation
Wu, C., Shao, S., Tunc, C., Satam, P., & Hariri, S. (2021). An explainable and efficient deep learning framework for video anomaly detection. Cluster Computing.Journal
Cluster ComputingRights
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Deep learning-based video anomaly detection methods have drawn significant attention in the past few years due to their superior performance. However, almost all the leading methods for video anomaly detection rely on large-scale training datasets with long training times. As a result, many real-world video analysis tasks are still not applicable for fast deployment. On the other hand, the leading methods cannot provide interpretability due to the uninterpretable feature representations hiding the decision-making process when anomaly detection models are considered as a black box. However, the interpretability for anomaly detection is crucial since the corresponding response to the anomalies in the video is determined by their severity and nature. To tackle these problems, this paper proposes an efficient deep learning framework for video anomaly detection and provides explanations. The proposed framework uses pre-trained deep models to extract high-level concept and context features for training denoising autoencoder (DAE), requiring little training time (i.e., within 10 s on UCSD Pedestrian datasets) while achieving comparable detection performance to the leading methods. Furthermore, this framework presents the first video anomaly detection use of combing autoencoder and SHapley Additive exPlanations (SHAP) for model interpretability. The framework can explain each anomaly detection result in surveillance videos. In the experiments, we evaluate the proposed framework's effectiveness and efficiency while also explaining anomalies behind the autoencoder’s prediction. On the USCD Pedestrian datasets, the DAE achieved 85.9% AUC with a training time of 5 s on the USCD Ped1 and 92.4% AUC with a training time of 2.9 s on the UCSD Ped2.Note
12 month embargo; published: 23 November 2021ISSN
1386-7857EISSN
1573-7543Version
Final accepted manuscriptSponsors
Air Force Office of Scientific Researchae974a485f413a2113503eed53cd6c53
10.1007/s10586-021-03439-5