Exploring Shared Memory and Cache to Improve GPU Performance and Energy Efficiency

Hao Wen and Wei Zhang
Virginia Commonwealth University


Abstract

Graphic Processing Units(GPU) use multiple, multi-threaded, SIMD cores to exploit data parallelism to boost performance. State-of-the-art GPUs use configurable shared memory and cache to improve performance for applications with different access patterns. Unlike CPU programs, GPU programs usually exhibit different access patterns, whose performance may not be heavily dependent on the cache access latencies. On the other hand, the shared memory capacity and other execution resources may become limiting factors to the parallelism, which can significantly affect performance. In this paper, we evaluate the impact of different shared memory and cache configurations on both the performance and energy consumption, which can provide useful insights for GPU programmers to use the configurable shared memory and cache more effectively.