Edge deployment of LLMs and neuro-symbolic AI is increasingly constrained by the memory bottleneck and power walls, demanding new co-designed hardware solutions for energy-efficient, scalable inference and learning. We advocate treating memory heterogeneity and 'CMOS+X' integration collectively as a unified, first-class design principle for next-generation cognitive AI hardware. This paper reviews our recent work on heterogeneous compute-in-memory (CIM) fabrics: (i) CENTAUR, a 40 nm floating-point RRAM–eDRAM fusion CIM chip; (ii) an analog MLC eDRAM–RRAM CIM architecture co-designed with zeroth-order fine-tuning; (iii) monolithic 3D co-design methodology with emerging oxide-semiconductor transistors (OSFETs); and (iv) 3D-CIMlet, an open-source modeling framework to explore heterogeneous CIM chiplets with 2.5D/3D integration.