Near-Memory Computing (NMC) addresses the Von Neumann bottleneck between memory and processing by bringing computation capabilities close to where data is stored. This approach opens a vast design space, which requires both a detailed view of the NMC processing units (PUs) architecture and a broad perspective of the entire system integrating them. To tackle this challenge, this paper presents a novel cross-layer methodology that relies on synthesizable hardware models of the NMC PUs, encapsulated as components for event-based full-system simulation. Our approach enables accurate assessment of the resource requirements and efficiency of near-memory units while exploring them \emph{in context}, i.e., when integrated into a system executing complex workloads. Hence, the resulting framework facilitates the systematic design exploration of NMC solutions. To this end, it also includes a complete software stack that supports the implementation and benchmarking of NMC-accelerated applications. We illustrate its effectiveness by providing area, energy, and performance metrics for executing Machine Learning (ML) inference benchmarks across the host CPU and the PUs using different DRAM standards. Our cross-layer approach highlights the potential of NMC acceleration, demonstrating system-wide speedups of 17.4x on average.