State-of-the-art General-Purpose computing on Graphics Processing Unit (GPGPU) is facing severe power challenge due to the increasing number of cores placed on a chip with decreasing feature size. In order to hide the long latency operations, GPGPU employs the fine-grained multi-threading among numerous active threads, leading to the sizeable register files with massive power consumption. Exploring the optimal power savings in register files becomes the critical and first step towards the energy-efficient GPGPU design. The conventional method to reduce dynamic power consumption is the supply voltage scaling, and the inter-bank tunneling FETs (TFETs) are the promising candidates compared to CMOS for low voltage operations regarding to both leakage and performance. However, always executing at the low voltage (so that low frequency) will result in significant performance degradation. In this study, we propose the hybrid CMOS-TFET based register files. To optimize the register power consumption, we allocate TFET-based registers to threads whose execution progress can be delayed to some degree to avoid the memory contentions with other threads, and the CMOS-based registers are still used for threads requiring normal execution speed. Our experimental results show that the proposed technique achieves 30% energy (including both dynamic and leakage) reduction in register files with little performance degradation compared to the baseline case equipped with naive power optimization technique.