Sparsity Aware Pre-processing for Systolic Array Dataflow Acceleration

Tadikonda Venkata Sai Chaitanya1, Bhargav D V2, Madhav Rao2
1International Insititute of Information Technology-Bangalore, 2International Institute of Information Technology-Bangalore


Abstract

Structured sparsity is key to reducing the computational cost of neural networks. However, conventional systolic array (SA) architectures fail to leverage sparse weights, performing redundant multiply-accumulate operations on zero elements. This paper proposes a novel pre-processing block that reorders rows and columns of the sparse weight matrix prior to systolic array execution. The experimental analysis of this reordering enhances data locality and computational alignment, thereby significantly reducing the number of clock cycles required during matrix multiplication. Our approach enables more efficient utilization of the processing elements (PEs). This work designs 4 × 4 Output Stationary systolic array to validate its effectiveness on several neural networks, including LeNet−5, VGG−11, and ResNet−50, across varying sparsity levels, demonstrating substantial cycle savings and performance gains. The hardware efficiency is evaluated using ASAP 7nm Regular Vt PDK, yielding a 18% and 36.22% increase in silicon footprint and static power due to integration of a pre-processing block, yet leading to a significant dynamic power savings of 50.18%. The pre-processed sparsity reordering strategy achieves overall performance gain for ResNet−50 model by 9% to 34%, and for VGG−11 model by 15% to 28%. This lightweight optimization provides a practical path towards sparsity-aware acceleration in systolic array based architectures.