Algorithms for Scalability and Security in Adversarial Environments
KeywordsAdversarial Machine Learning
Automatic Speech Recognition
Deep Neural Networks
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractThe scalability and security characteristics are central to modern machine learning and data science pipelines. The scalability issue of machine learning pipelines is that many real-world applications encounter large-scale datasets that are almost unimaginable. As a result, the ever-increasing data scale makes many of the classical machine learning algorithms obsolete in the face of big data. Meanwhile, the security issue emerged from the assumption that a machine learning model does not consider an adversary’s existence to subvert a classifier’s objective. Therefore, machine learning pipelines exhibit vulnerabilities in an adversarial environment. In this thesis, we investigate the scalability and security issues of machine learning with a focus on Feature Selection (FS) and Automatic Speech Recognition (ASR). FS is a critical preprocessing stage that helps avoid the “curse of dimensionality” and overfitting issues by identifying the feature subset that is both relevant and non-redundant. In the past few decades, FS has been driven by exploring “big data” and the extensive development of high-performance computing. Nevertheless, the implementation of scalable FS remains an under-explored topic. Moreover, although the research community has made extensive efforts to promote the classifiers’ security and develop countermeasures against adversaries, only a few contributions investigated the FS’s behaviors in an adversarial environment. Given that machine learning pipelines are increasingly relying on FS to combat the “curse of dimensionality” and overfitting issues, insecure FS can be the “Achilles heel” of data pipelines. In this thesis, we address the scalability and security issues of information-theoretic filter FS algorithms. In our contributions, we revisit the greedy forward optimization in information-theoretic filter FS and propose a semi-parallel optimizing paradigm that can provide an equivalent feature subset as the greedy forward optimization but in a fraction of the time. We next explore weaknesses of information-theoretic filter FS algorithms by designing a generic FS poisoning algorithm. We also demonstrate the transferability of the proposed poisoning algorithm across seven information-theoretic FS algorithms. The remainder of this thesis examines the security issues for Deep Neural Networks (DNNs) based ASR applications. Recently, DNNs have achieved remarkable success in numerous real-world applications. However, recent contributions have shown that DNNs can be easily fooled by adversarial inputs that appear legitimate. In the past decade, the majority of adversarial machine learning research focused on image domains; however, far fewer work on audio exist. Thus, developing novel audio attack algorithms and adversarial audio detection methods both remain under-explored. Further, we revisit the structure of LSTMs that are used in ASR then propose a new audio attack algorithm that evades the state-of-the-art temporal dependency-based detection by explicitly controlling the temporal dependency in generated adversarial audios. Finally, we leverage the DNN quantization techniques and propose a novel adversarial audio detection method by incorporating the DNN’s activation quantization error.
Degree ProgramGraduate College
Electrical & Computer Engineering