• Automated bug localization in JIT compilers

      Lim, HeuiChan; Debray, Saumya; Department of Computer Science, University of Arizona (ACM, 2021-04-07)
      Many widely-deployed modern programming systems use just-in-Time (JIT) compilers to improve performance. The size and complexity of JIT-based systems, combined with the dynamic nature of JIT-compiler optimizations, make it challenging to locate and fix JIT compiler bugs quickly. At the same time, JIT compiler bugs can result in exploitable security vulnerabilities, making rapid bug localization important. Existing work on automated bug localization focuses on static code, i.e., code that is not generated at runtime, and so cannot handle bugs in JIT compilers that generate incorrect code during optimization. This paper describes an approach to automated bug localization in JIT compilers, down to the level of distinct optimization phases, starting with a single initial Proof-of-Concept (PoC) input that demonstrates the bug. Experiments using a prototype implementation of our ideas on Google's V8 JavaScript interpreter and TurboFan JIT compiler demonstrates that it can successfully identify buggy optimization phases. © 2021 ACM.
    • Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading

      Van, Hoang; Yadav, Vikas; Surdeanu, Mihai; University of Arizona (ACM, 2021-07-11)
      We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC). Our approach first pretrains the answer extraction components of a MRC system on the augmented data that contains approximate context of the correct answers, before training it on the exact answer spans. The approximate context helps the QA method components in narrowing the location of the answers. We demonstrate that our simple strategy substantially improves both document retrieval and answer extraction performance by providing larger context of the answers and additional training data. In particular, our method significantly improves the performance of BERT based retriever (15.12%), and answer extractor (4.33% F1) on TechQA, a complex, low-resource MRC task. Further, our data augmentation strategy yields significant improvements of up to 3.9% exact match (EM) and 2.7% F1 for answer extraction on PolicyQA, another practical but moderate sized QA dataset that also contains long answer spans.
    • Drawing Graphs on the Sphere

      Perry, Scott; Yin, Mason Sun; Gray, Kathryn; Kobourov, Stephen; University of Arizona (ACM, 2020-10-02)
      Graphs are most often visualized in the two dimensional Euclidean plane, but spherical space offers several advantages when visualizing graphs. First, some graphs such as skeletons of three dimensional polytopes (tetrahedron, cube, icosahedron) have spherical realizations that capture their 3D structure, which cannot be visualized as well in the Euclidean plane. Second, the sphere makes possible a natural "focus + context visualization with more detail in the center of the view and less details away from the center. Finally, whereas layouts in the Euclidean plane implicitly define notions of "central and "peripheral nodes, this issue is reduced on the sphere, where the layout can be centered at any node of interest. We first consider a projection-reprojection method that relies on transformations often seen in cartography and describe the implementation of this method in the GMap visualization system. This approach allows many different types of 2D graph visualizations, such as node-link diagrams, LineSets, BubbleSets and MapSets, to be converted into spherical web browser visualizations. Next we consider an approach based on spherical multidimensional scaling, which performs graph layout directly on the sphere. This approach supports node-link diagrams and GMap-style visualizations, rendered in the web browser using WebGL.
    • Predicting protein secondary structure by an ensemble through feature-based accuracy estimation

      Krieger, Spencer; Kececioglu, John; University of Arizona, Computer Science (ACM, 2020-09-21)
      Protein secondary structure prediction is a fundamental task in computational biology, basic to many bioinformatics workflows, with a diverse collection of tools currently available. An approach from machine learning with the potential to capitalize on such a collection is ensemble prediction, which runs multiple predictors and combines their predictions into one, output by the ensemble. We conduct a thorough study of seven different approaches to ensemble secondary structure prediction, several of which are novel, and show we can indeed obtain an ensemble method that significantly exceeds the accuracy of individual state-of-The-Art tools. The best approaches build on a recent technique known as feature-based accuracy estimation, which estimates the unknown true accuracy of a prediction, here using features of both the prediction output and the internal state of the prediction method. In particular, a hybrid approach to ensemble prediction that leverages accuracy estimation is now the most accurate method currently available: on average over standard CASP and PDB benchmarks, it exceeds the state-of-The-Art Q3 accuracy for 3-state prediction by nearly 4%, and exceeds the Q8 accuracy for 8-state prediction by more than 8%. A preliminary implementation of our approach to ensemble protein secondary structure prediction, in a new tool we call Ssylla, is available free for non-commercial use at ssylla.cs.arizona.edu. © 2020 ACM.
    • Recognition and Recall of Geographic Data In Cartograms

      Nusrat, Sabrina; Alam, Jawaherul; Kobourov, Stephen; University of Arizona (ACM, 2020-10-02)
      We investigate the memorability of two types of cartograms, both in terms of recognition of the visualization and recall of the data. A cartogram, or a value-by-area map, is a representation of a map in which geographic regions are modified to reflect a given statistic, such as population or income. Of the many different types of cartograms, the contiguous and Dorling types are among the most popular and most effective. With this in mind, we evaluate the memorability of these two cartogram types with a human-subjects study, using task-based experimental data and cartogram visualization tasks based on Bertin's map reading levels. In particular, our results indicate that Dorling cartograms are associated with better recall of general patterns and trends. This, together with additional significant differences between the two most popular cartogram types, has implications for the design and use of cartograms, in the context of memorability.
    • Representing and reasoning about dynamic code

      Bartels, Jesse; Stephens, Jon; Debray, Saumya; University of Arizona, Department of Computer Science (ACM, 2021-01-27)
      Dynamic code, i.e., code that is created or modified at runtime, is ubiquitous in today's world. The behavior of dynamic code can depend on the logic of the dynamic code generator in subtle and nonobvious ways, e.g., JIT compiler bugs can lead to exploitable vulnerabilities in the resulting JIT-compiled code. Existing approaches to program analysis do not provide adequate support for reasoning about such behavioral relationships. This paper takes a first step in addressing this problem by describing a program representation and a new notion of dependency that allows us to reason about dependency and information flow relationships between the dynamic code generator and the generated dynamic code. Experimental results show that analyses based on these concepts are able to capture properties of dynamic code that cannot be identified using traditional program analyses. © 2020 ACM.
    • Research on the Forecast of the Spread of COVID-19

      Guo, Lihao; Yang, Yuxin; James E Rogers College of Law, The University of Arizona (ACM, 2021-07-20)
      With the spreading of COVID-19, various existing machine learning frameworks can be adopted to effectively control the epidemic to help research and predict the spread of the virus before the large-scale application of vaccines. Based on the spatiotemporal graph neural network and mobility data, this paper attempts to offer a novel prediction by building a high-resolution graph with the characteristics such as willingness to wear masks, daily infection, and daily death. This model is different from the time series prediction model. The method learns from the multivariate spatiotemporal graph, the nodes represent the region with daily confirmed cases and death, and edges represent the inter-regional contacts based on mobility. Simultaneously, the transmission model is built by a time margin as the characteristic of the time change. This paper builds the COVID-19 model by using STGNNs and tries to predict and verify the virus's infection. Finally, the model has an absolute Pearson Correlation of 0.9735, far from the expected value of 0.998. The predicted value on the first and second day is close to the real situation, while the value gradually deviates from the actual situation after the second day. It still shows that the graph neural network uses much temporal and spatial information to enable the model to learn complex dynamics. In the future, the model can be improved by tuning hyper-parameter such as modulation numbers of convolution, or construction of graphs that suitable for smaller individuals such as institutions, buildings, and houses, as well as assigning more features to each node. This experiment demonstrates the powerful combination of deep learning and graph neural networks to study the spread and evolution of COVID-19.
    • Ripple Effect: Communicating Water Quality Data through Sonic Vibrations

      B. Kaufmann, Dorsey; Hamidi, Nima; Palawat, Kunal; Ramirez-Andreotta, Monica; School of Art, University of Arizona; Department of Environmental Science, University of Arizona (ACM, 2021-06-22)
      Pollution in real time can be incredibly powerful, but is difficult to communicate. Persistent deterioration of land, air, and water are largely invisible to the eye and camera lens. What if water itself could visualize its quality and perform the level of contamination? Ripple Effect is an environmental art installation that reveals water contamination through sonic vibrations and light. Using software technology, water contamination levels are translated into sound waves. The installation consists of speakers that play gdata sound tracks', which vibrate water held in attached trays. Participants see and hear the water vibrate based on contaminant concentrations. This paper describes the concept, data-To-sound process, implementation, and participant evaluation surrounding the installation of Ripple Effect in communities neighboring resource extraction and other industrial activity. While there are many existing artworks that visualize environmental quality, Ripple Effect is novel in its use of local water quality data and interactive technology that allows the primary medium, water, to communicate directly with the participant. © 2021 Owner/Author.