Automated Localization of Dynamic Code Generation Bugs in Just-in-Time Compilers
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
This dissertation presents a new approach to the automatic localization of dynamic code generation bugs in Just-in-Time (JIT) compilers. JIT compilers are widely utilized to improve the performance of interpreted code. However, incorrect code generated by JIT compilers may pose significant security risks or raise a reliability concern. Therefore, addressing bugs quickly is essential to mitigate potential security concerns and improve code quality. Existing bug localization approaches for ordinary software or traditional compilers often fall short when applied to JIT compilers. The approaches overlook essential features of JIT compilers, including their size and complexity, dynamic code generation, and the absence of debugging information. The core approach proposed in this dissertation is to model the execution behavior of the JIT compiler explicitly. These models are system independent, meaning that the approach can be used to construct models for different JIT compilers, such as Google’s TurboFan and Mozilla’s IonMonkey. The approach focuses on modeling two key JIT compiler behaviors: The optimization of the optimizer’s intermediate representation (IR) and the manipulation of the back-end representation for code generation. By carefully examining the modeled representations, the approach aims to identify sections of representations that are manipulated incorrectly. Then, by analyzing the memory accesses within the models, the approach identifies the buggy location in the JIT compiler source code (i.e., functions). The approach is based on two key insights: (1) the constructed model should be an abstract representation of concrete JIT compiler representations, i.e., the model should hold information that are common to multiple JIT compiler representations, and (2) the difference between a model constructed from a buggy execution and a model constructed from a non-buggy execution should contain information about the bug. Another critical technique to improve the bug localization accuracy proposed in this work is automated test program generation, as the characteristics of the input test programs can significantly impact bug localization performance. The following two key insights motivated the approach: (1) the generated test programs should contain both passing inputs (which do not trigger the bug) and failing inputs (which trigger the bug), and (2) the passing inputs should be as similar as possible to the initial seed input, while the failing programs should be as different as possible from it. Experimental results using a prototype implementation on two widely used JavaScript JIT compilers, namely Google’s V8 TurboFan and Mozilla’s IonMonkey demonstrates that the proposed approach achieves a higher accuracy in identifying suspicious functions related to dynamic code generation bugs compared to existing approachesType
Electronic Dissertationtext
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeComputer Science