The crosspoint array with resistive synaptic devices is a promising hardware solution to accelerate neuromorphic computing on a chip. This paper first explores the design challenges in such architecture across multiple layers, from devices to circuits and to systems. At the device level, non-ideal properties, such as variability, non-linearity and array parasitics, need to be carefully managed in order to avoid performance degradation. At the circuit level, the integration of CMOS peripheral circuits, as well as the scalability of the array, demand new innovations to achieve area efficiency and latency. At the system level, fundamental challenges exist in memory bandwidth and interconnection for such a computation model. To overcome these barriers, this paper discusses various mitigation strategies in practice. Furthermore, it proposes a hierarchical simulator, MNSIM, to systematically evaluate the performance of a neuromorphic system, and to assess the trade-off among area, energy, speed and computing accuracy.