Clock networks dissipate a significant fraction of the entire chip power budget. In contrast to most of the traditional works that handle the power optimization problem with clock routing or buffer sizing, we propose a novel register clustering methodology for power reduction of clock trees. Moreover, a fast three-stage clock tree synthesis (CTS) approach based on register clustering is presented to verify the validity of the methodology. By comparison with the state-of-the-art low power CTS research works Contango2.0  and the CTS of Purdue University , our three-stage CTS approach achieves 1.30x, 1.07x smaller power consumption while exhibiting 2.01x, 1.52x smaller skew. Furthermore, the runtime of our CTS approach is 17.36x, 8.16x shorter than that of  and  respectively.