Numerous CNN accelerators, called neural processing units (NPUs), have been proposed and developed recently to accelerate CNN computation with a customized chip. To minimize the DRAM access volume, NPUs commonly have a large on-chip memory and try to reuse the fetched data from the off-chip DRAM maximally. While extensive researches have been conducted to minimize the effect of off-chip DRAM access on the performance in the NPU design, little attention is paid to the detailed analysis of the off-chip DRAM access overhead on the NPU performance. In this paper, we analyze the effects of off-chip DRAM access latency on the NPU performance and how the off-chip SRAM changes the DRAM access latency based on a cycle-accurate system simulation environment.