Heterogeneous Compute-in-Memory Fabrics for Efficient, Scalable Edge Inference and Learning

Luqi Zheng, Zeshu Wang, Shuting Du, Mufeng Chen, Amir Massah Bavani, Haitong Li
Purdue University


Abstract

Edge deployment of LLMs and neuro-symbolic AI is increasingly constrained by the memory bottleneck and power walls, demanding new co-designed hardware solutions for energy-efficient, scalable inference and learning. We advocate treating memory heterogeneity and 'CMOS+X' integration collectively as a unified, first-class design principle for next-generation cognitive AI hardware. This paper reviews our recent work on heterogeneous compute-in-memory (CIM) fabrics: (i) CENTAUR, a 40 nm floating-point RRAM–eDRAM fusion CIM chip; (ii) an analog MLC eDRAM–RRAM CIM architecture co-designed with zeroth-order fine-tuning; (iii) monolithic 3D co-design methodology with emerging oxide-semiconductor transistors (OSFETs); and (iv) 3D-CIMlet, an open-source modeling framework to explore heterogeneous CIM chiplets with 2.5D/3D integration.