eDRAM-OESP: A novel performance efficient in-embedded-DRAM-compute design for on-edge signal processing application

Mayank Kabra1, Prashanth H C2, Kedar Deshpande1, Madhav Rao3
1Student, 2IIIT-Bangalore, 3International Institute of Information Technology-Bangalore


Abstract

In-Memory-Computing (IMC) architectures allow arithmetic and logical functionalities around the memory arrays to effectively use the memory bandwidth and avoid frequent data movement to the processor. As expected, the IMC architecture leads to high throughput performance and significant energy savings primarily due to less workload moving data from memory to the computing core. Embedded DRAM (eDRAM), composed of 1-transistor, 1-capacitor (1T1C) bit cell with logic block enables computing with benefits in terms of power savings and high performance, favorable for embedded computing engines. The work proposes a novel in-eDRAM-compute design employing a 1T1C eDRAM cell with the bit-serial computation that targets 3x throughput efficiency by arranging the operand bits in an interleaved manner. The interleaved eDRAM architecture enables to employ reading corresponding bits of multiple operands from the memory cells at the same time, and also allows to write back post computing in the same activate window, thereby saving on the multiple precharge and activate cycles. Additionally, the interleaved architecture allows pipelining the continuously arriving digitized signal and processes the same. The computing block in the form of a 1-bit adder with a multiplexer unit is optimized for different hardware metrics such as delay, power, and product of power-and-delay (PDP) for adopting the design per the specifications. The eDRAM-based efficient computing design is evaluated for 1-bit adder and further characterized for 8-bit, and 16-bit adders, multipliers, and 1-D convolution of varying filter sizes. The proposed design exhibited improvement in computing time by 31\% for 16-bit addition and 30.6\% for 8-bit addition over the existing state-of-the-art work. The bit-serial in-eDRAM-compute design achieved the best performance of 2.5~ms of computing time and 120~nJ of energy for performing a 1-D convolution operation. efficiency of the proposed optimized design. The in-eDRAM-compute design is a step towards designing embedded memory with convolutional neural network~(CNN) compute capability for customized real-time edge inferencing applications.