Integrating Emerging Memories for Analog DNN Accelerators

An Chen
IBM


Abstract

As the CMOS scaling approaches the fundamental limits, numerous emerging devices have been explored for better performance, higher efficiency, and new functionalities. Instead of replacing CMOS transistors, many of these devices are more suitable for new computing paradigms beyond Boolean logic and von Neumann architectures. For example, in-memory computing reduces data movement between computing and memory units, and exploits the intrinsic parallelism in memory arrays. Neural-inspired computing implements cognitive and intelligent functions through a wide range of approaches, e.g., deep neural network (DNN), spiking neural network (SNN), hyperdimensional computing, probabilistic network, dynamical systems, etc. Although some novel computing paradigms can be implemented in CMOS technologies, more efficient solutions may come from emerging materials and devices that can enable native implementations of these computing paradigms. At the same time, the circuits or accelerators based on emerging technologies often need to be integrated into the CMOS platform to create fully functional systems.

In recent years, DNN has surpassed human performance in various AI applications, e.g., image classification, language processing, etc. General-purpose CPU/GPU and special-purpose digital accelerators have provided current and near-term DNN hardware solutions. In the longer term, there are opportunities to achieve significantly higher performance and energy-efficiency with DNN accelerators based on analog memories. Among emerging non-volatile memory (NVM) devices, phase change memory (PCM) has the advantages of maturity and the availability of large-scale arrays. PCM-based analog DNN accelerators have been demonstrated at advanced technology node (e.g., 14nm) with millions of devices, and achieved iso-accuracy on increasingly large network models. The success of these accelerators is the result of the integration of highly efficient analog cores for multiply-accumulate (MAC) operations with advanced CMOS circuitry for creative designs of auxiliary digital functions. This talk will discuss the progress that we have achieved on PCM-based DNN inference accelerators, the challenges of PCM materials and devices, and promising solutions in technology and design.