Cache Register Sharing Structure for Channel-level Near-memory Processing in NAND Flash Memory

HyunWoo Kim1, Seungwon Baek1, Minyoung Jung2, Jaehong Song1, Hyodong Kim1, Junhyeon Kim1, Seongju Kim1, Taigon Song1, Jongbeom Kim1, Hyundong Lee3, Yunjeong Go3
1Kyungpook National University (KNU), 2Kyungpook National University (KNU),, 3Kyungpook National University


A vast number of data used for Artificial intelligence causes bottleneck between the processor and memory. To tackle this issue, a technology that embeds a processing unit in the memory (PIM: Processing-in Memory) has been proposed. However, SRAM/DRAM based PIM have a issue for lack of capacity. Thus, we propose a NAND flash PIM scheme that shares the cache register. Our scheme significantly reduces the read latency and runtime by -22.8% and 43.7%, compared to the conventional memory system. The power-performance-area (PPA) was reduced by 17.2% by shortening the number of cycles. Our NAND PIM specializes in large-scale computation tasks.