• Automated bug localization in JIT compilers

      Lim, HeuiChan; Debray, Saumya; Department of Computer Science, University of Arizona (ACM, 2021-04-07)
      Many widely-deployed modern programming systems use just-in-Time (JIT) compilers to improve performance. The size and complexity of JIT-based systems, combined with the dynamic nature of JIT-compiler optimizations, make it challenging to locate and fix JIT compiler bugs quickly. At the same time, JIT compiler bugs can result in exploitable security vulnerabilities, making rapid bug localization important. Existing work on automated bug localization focuses on static code, i.e., code that is not generated at runtime, and so cannot handle bugs in JIT compilers that generate incorrect code during optimization. This paper describes an approach to automated bug localization in JIT compilers, down to the level of distinct optimization phases, starting with a single initial Proof-of-Concept (PoC) input that demonstrates the bug. Experiments using a prototype implementation of our ideas on Google's V8 JavaScript interpreter and TurboFan JIT compiler demonstrates that it can successfully identify buggy optimization phases. © 2021 ACM.
    • Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading

      Van, Hoang; Yadav, Vikas; Surdeanu, Mihai; University of Arizona (ACM, 2021-07-11)
      We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC). Our approach first pretrains the answer extraction components of a MRC system on the augmented data that contains approximate context of the correct answers, before training it on the exact answer spans. The approximate context helps the QA method components in narrowing the location of the answers. We demonstrate that our simple strategy substantially improves both document retrieval and answer extraction performance by providing larger context of the answers and additional training data. In particular, our method significantly improves the performance of BERT based retriever (15.12%), and answer extractor (4.33% F1) on TechQA, a complex, low-resource MRC task. Further, our data augmentation strategy yields significant improvements of up to 3.9% exact match (EM) and 2.7% F1 for answer extraction on PolicyQA, another practical but moderate sized QA dataset that also contains long answer spans.
    • Predicting protein secondary structure by an ensemble through feature-based accuracy estimation

      Krieger, Spencer; Kececioglu, John; University of Arizona, Computer Science (ACM, 2020-09-21)
      Protein secondary structure prediction is a fundamental task in computational biology, basic to many bioinformatics workflows, with a diverse collection of tools currently available. An approach from machine learning with the potential to capitalize on such a collection is ensemble prediction, which runs multiple predictors and combines their predictions into one, output by the ensemble. We conduct a thorough study of seven different approaches to ensemble secondary structure prediction, several of which are novel, and show we can indeed obtain an ensemble method that significantly exceeds the accuracy of individual state-of-The-Art tools. The best approaches build on a recent technique known as feature-based accuracy estimation, which estimates the unknown true accuracy of a prediction, here using features of both the prediction output and the internal state of the prediction method. In particular, a hybrid approach to ensemble prediction that leverages accuracy estimation is now the most accurate method currently available: on average over standard CASP and PDB benchmarks, it exceeds the state-of-The-Art Q3 accuracy for 3-state prediction by nearly 4%, and exceeds the Q8 accuracy for 8-state prediction by more than 8%. A preliminary implementation of our approach to ensemble protein secondary structure prediction, in a new tool we call Ssylla, is available free for non-commercial use at ssylla.cs.arizona.edu. © 2020 ACM.
    • Representing and reasoning about dynamic code

      Bartels, Jesse; Stephens, Jon; Debray, Saumya; University of Arizona, Department of Computer Science (ACM, 2021-01-27)
      Dynamic code, i.e., code that is created or modified at runtime, is ubiquitous in today's world. The behavior of dynamic code can depend on the logic of the dynamic code generator in subtle and nonobvious ways, e.g., JIT compiler bugs can lead to exploitable vulnerabilities in the resulting JIT-compiled code. Existing approaches to program analysis do not provide adequate support for reasoning about such behavioral relationships. This paper takes a first step in addressing this problem by describing a program representation and a new notion of dependency that allows us to reason about dependency and information flow relationships between the dynamic code generator and the generated dynamic code. Experimental results show that analyses based on these concepts are able to capture properties of dynamic code that cannot be identified using traditional program analyses. © 2020 ACM.
    • Ripple Effect: Communicating Water Quality Data through Sonic Vibrations

      B. Kaufmann, Dorsey; Hamidi, Nima; Palawat, Kunal; Ramirez-Andreotta, Monica; School of Art, University of Arizona; Department of Environmental Science, University of Arizona (ACM, 2021-06-22)
      Pollution in real time can be incredibly powerful, but is difficult to communicate. Persistent deterioration of land, air, and water are largely invisible to the eye and camera lens. What if water itself could visualize its quality and perform the level of contamination? Ripple Effect is an environmental art installation that reveals water contamination through sonic vibrations and light. Using software technology, water contamination levels are translated into sound waves. The installation consists of speakers that play gdata sound tracks', which vibrate water held in attached trays. Participants see and hear the water vibrate based on contaminant concentrations. This paper describes the concept, data-To-sound process, implementation, and participant evaluation surrounding the installation of Ripple Effect in communities neighboring resource extraction and other industrial activity. While there are many existing artworks that visualize environmental quality, Ripple Effect is novel in its use of local water quality data and interactive technology that allows the primary medium, water, to communicate directly with the participant. © 2021 Owner/Author.