Preliminary Program

RASH: Reliable Deep Learning Acceleration using Sparsity-based Hardware

Shamik Kundu¹, ARNAB RAHA², Deepak Mathaikutty², Kanad Basu¹
¹University of Texas at Dallas, ²Intel Corporation

Abstract

With the proliferation of Deep Neural Networks (DNNs), the implementation of specialized hardware accelerators has started gaining attention to realize the efficient execution of state-of-the-art networks. The DNNs usually constitute a series of convolution layers, each of which performs a convolution operation between an input activation tensor and a weight (or filter kernels) tensor. Both the sets of activation and weight tensors have a lot of sparsity that does not impact the output of the multiply and accumulate operation, associated with the convolution operation. In this paper, we exploit this sparsity to accomplish improved acceleration in the inference execution, leading to higher speedup/throughput as well as less energy consumption in the architecture. To this end, we present a novel sparsity-based acceleration logic, that leverages both-sided fine-grained sparsity in the activation and weight tensors to skip ineffectual computations, thereby implementing an efficient convolution engine in a hardware accelerator. However, drastic technology scaling in recent years has made these circuits highly vulnerable to faults due to various reasons like aging, latent defects, single event upsets, etc. As demonstrated in this paper, such circuit-level faults, manifested in the sparsity logic can result in graceless degradation in classification accuracy, as well as control failure in the sparse DNN accelerator in mission mode. To circumvent this, we propose RASH, a Reliable deep learning Acceleration framework using Sparsity-based Hardware, that augments the proposed sparsity acceleration logic with a novel in-field detection and dynamic mitigation technique in the face of faults. When evaluated on state-of-the-art network dataset configurations, our proposed RASH framework recovers up to 100% of the degraded inference accuracy, in lieu of only 1.26% area overhead and 5.37% power overhead in the resource-limited settings, which can be further reduced with dynamic fault mitigation strategy at the edge.