The growing parallelism in most of today’s applications has led to an increased demand for parallel computing in processors. General Purpose Graphics Processing Units (GPGPUs) have been used extensively to provide the necessary computation for highly parallel applications. GPGPUs generate huge volumes of network traffic between memory controllers (MCs) and cores. In such cases, the network-on-chip (NoC) fabric can become a performance bottleneck, especially for memory intensive applications on GPGPUs. Traditional mesh-based NoC topologies possess high network latency that leads to congestion at MCs and an increase in application execution time. To overcome this challenge, we propose a novel memory-aware circuit overlay NoC that exploits traffic characteristics in GPGPUs to eliminate router arbitration at each hop. Flits sent on the fast overlay circuits reach their destinations in just 3 cycles (at 1GHz). Our experimental results show that our proposed approach yields an improvement of 20-55% in latency, 20-70% in application execution time, up to 10% saving in power consumption and 10-65% saving in overall energy consumption compared to the state-of-the-art.