Preliminary Program

A SIMD Dynamic Fixed Point Processing Engine for DNN Accelerators

Gopal Raut¹, PRANOSE EDAVOOR², DAVID SELVAKUMAR³, ritambhara thakur¹
¹CDAC Bangalore, ²CENTRE FOR DEVELOPMENT OF ADVANCED COMPUTING, ³C-DAC, BANGALORE

Abstract

.For DNN accelerators, we introduce a novel convo- lution layer Processing Engine (PE) that combine multi-precision dynamic fraction fixed-point Multiply-Accumulate (MAC) and Activation Function (AF) units (8/16 bits with shared resources). This PE, functions as a Single Instruction, Multiple Data (SIMD) engine, facilitates switching the precision dynamically for 8-bit or 16-bit computations, supporting multi-precision (N=8 or 16) operations with dynamic fraction fixed-point data. Pareto analysis was performed to optimize the resources for determining the ideal number of parallel Cordic stages needed for multi-precision dynamic fraction fixed-point AF, a SIMD Engine. Our proposed approach demonstrates minimal accuracy loss, with less than 1% for LeNet with MNIST, less than 1% for AlexNet with CIFAR- 10, and less than 2% for VGG16 with CIFAR-10, as compared to reference float32-based implementations. Experimental results using the Virtex-VCU118 Evaluation kit highlight that design of SIMD multi-precision dynamic fraction MAC and CORDIC AFs units with shared resources exhibits significant improvements in resource efficiency as compared to non-shared resources. Specifically, we observed a 51.83% reduction in LUTs for MAC units and a 43.50% decrease in LUTs for AFs as compared to a SIMD engine based MAC + AF without shared resources realized with separate MAC and AFs for 8 and 16 bits. Thereby, this multi-precision design with shared resources for 4x8 and 1x16-bit processing optimizes resource utilization, enhances versatility and throughput, enables efficient computations across diverse DNN applications, models, and layers with a unique SIMD-enabled PE.