We are upgrading the repository! We will continue our upgrade in February 2025 - we have taken a break from the upgrade to open some collections for end-of-semester submission. The MS-GIST Master's Reports, SBE Senior Capstones, IPLP dissertations, and UA Faculty Publications collections are currently open for submission. Please reach out to repository@u.library.arizona.edu with your questions, or if you are a UA affiliate who needs to make content available in another collection.
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Future high performance computing (HPC) systems will face unique problems, including high power consumption and severe network contention. Both power and the network are shared resources; while individual jobs can optimize their use of these resources, we will realize greater benefits if we optimize them across all running jobs. Accordingly, this dissertation presents inter-job optimization strategies to limit power consumption and to mitigate network contention. One way to reduce HPC power consumption is to enforce a fixed power limit for running jobs. However, HPC applications do not consume constant power over their lifetimes. Thus, applications that are assigned a fixed power bound may be forced to slow down during high-power computation phases, but may not consume their full power allocation during low-power I/O phases. This dissertation explores algorithms that leverage application characteristics—phase frequency, duration and power needs—to shift unused power from applications in I/O phases to applications in computation phases, thus improving system-wide performance. We design novel techniques that include explicit staggering of applications to improve power shifting. Compared to executing without power shifting, our algorithms can improve average performance by up to 8% or improve performance of a single, high-priority application by up to 32%. We also investigate the use of Quality of Service (QoS) mechanisms to reduce the negative impact of network contention. QoS allows users to manage resource sharing between network flows and to provide bandwidth guarantees to specific flows. Our results show that applying QoS at the job level significantly reduces the impact of contention on high priority jobs, but it degrades the performance of other jobs and reduces overall throughput. However, applying QoS at the process level improves performance for specific jobs up to 40%, and in some cases it completely eliminates the impact of contention. It achieves these improvements with limited negative impact on other jobs; any job that experiences performance loss typically degrades less than 5%, often much less. The inter-job optimizations presented in this dissertation improve power and network management on HPC systems. Current and future systems can employ these techniques to enhance their performance and efficiency.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeComputer Science