Sparsity-Aware Hardware-Software Co-design of Spiking Neural Network Accelerators
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Spiking Neural Networks (SNNs) are bio-inspired event-driven alternatives to Artificial Neural Networks (ANNs), offering the potential for energy-efficient artificial intelligence (AI) in resource-constrained edge devices. Sparsity, the concept of activating only a small fraction of neurons at any given time, is a fundamental principle that plays a crucial role in achieving energy and computational efficiency in SNNs. However, current studies exploiting sparsity often lack thorough design space analysis, overlooking the potential workload imbalances caused by irregular sparse patterns across different tasks. Furthermore, existing Sparse Matrix-Dense Matrix multiplication (SpMM) kernels, widely used in ANNs, are sub-optimal for SNNs due to their unique multiplication-free property, necessitating the development of specialized hardware acceleration techniques. This dissertation makes two major contributions. The first contribution explores the factors that shape sparsity in SNNs and their impact on performance. We systematically investigate the effects of key training hyperparameters such as Surrogate Gradient (SG), neuronal parameters, and low-bit quantization, across diverse workloads. Our findings reveal novel insights into how these factors contribute to sparsity and provide guidance for optimizing SNN efficiency. We demonstrate a novel approach to SNN hardware-software co-design: by selecting surrogate functions that inherently induce lower firing rates, we can significantly reduce energy consumption without sacrificing accuracy. Secondly, we propose novel hardware designs to explicitly exploit sparsity in SNN accelerators, aiming to maximize energy efficiency and computational throughput. We propose a sparse design that employs a priority encoder for spike train compression and implements output-product (OP) tiling to achieve balanced workload parallelization. Building on this, we further propose a hybrid hardware design that seamlessly integrates this sparse core with a dense core. The hybrid approach is particularly well-suited for direct-coded SNNs, which directly process input samples without the need for an explicit encoding scheme, thus exhibiting varying degrees of sparsity across different layers. We rigorously evaluate our work through extensive simulations and FPGA-based prototyping, demonstrating the potential of our proposed approach to achieve significant efficiency gains in SNN accelerators compared to the state-of-the-art, all without compromising accuracy.Type
Electronic Dissertationtext
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeElectrical & Computer Engineering