PyTorch and CEDR: Enabling Deployment of Machine Learning Models on Heterogeneous Computing Systems
Affiliation
Electrical and Computer Engineering Department, The University of ArizonaIssue Date
2023-12-04
Metadata
Show full item recordPublisher
IEEECitation
H. U. Suluhan, S. Gener, A. Fusco, H. F. Ugurdag and A. Akoglu, "PyTorch and CEDR: Enabling Deployment of Machine Learning Models on Heterogeneous Computing Systems," 2023 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA), Giza, Egypt, 2023, pp. 1-8, doi: 10.1109/AICCSA59173.2023.10479315.Journal
Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSARights
@2023 IEEE.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
The PyTorch programming interface enables efficient deployment of machine learning models, leveraging the parallelism offered by GPU architectures. In this study, we present the integration of the PyTorch framework with a compiler and runtime ecosystem. Our aim is to demonstrate the ability to deploy PyTorch-based models on FPGA-based SoC platforms, without requiring users to possess prior FPGA-based design experience. The proposed PyTorch model transformation approach expands the range of hardware architectures that PyTorch developers can target, enabling them to take advantage of the energy-efficient execution provided by heterogeneous computing systems. Our experiments involve compiling and executing real-life applications on heterogeneous SoC configurations emulated on the Xilinx Zynq Ultrascale+ ZCU102 system. We showcase our ability to deploy three distinct PyTorch applications, encompassing object detection, visual geometry group (VGG), and speech classification, using the integrated compiler and runtime system without loss of model accuracy. Furthermore, we extend our analysis by evaluating dynamically arriving workload scenarios, consisting of a mix of PyTorch models and non-PyTorch-based applications. Through these experiments, we vary the hardware composition and scheduling heuristics. Our findings indicate that when PyTorch-based applications coexist with unrelated applications, our integrated scheduler fairly dispatches tasks to the FPGA platform's accelerator and CPU cores, without compromising the target throughput for each application.Note
Immediate accessVersion
Final accepted manuscriptSponsors
Defense Advanced Research Projects Agencyae974a485f413a2113503eed53cd6c53
10.1109/aiccsa59173.2023.10479315