A Low-overhead Dilithium-NTT Architecture using Accelerated K-RED Modular Reduction Unit

Harsh Gupta1, Aryan Goyal1, Paranjay Dhadwal1, Jugal Gandhi2, Diksha Shekhawat3, M. Santosh4, Jai Gopal Pandey4
1Birla Institute of Technology and Science (BITS) Pilani Goa Campus, 2AcSIR at CSIR-CEERI, 3AcSIR, CSIR-CEERI, 4CSIR -Central Electronics Engineering Research Institute (CEERI), Pilani, Rajasthan, India


Abstract

The increasing threat posed by quantum computers to classical cryptographic systems necessitates the adoption of post-quantum cryptographic (PQC) algorithms. CRYSTALS-Dilithium, a lattice-based digital signature scheme standardized by NIST, relies heavily on efficient polynomial multiplication using the number theoretic transform (NTT). This work presents an FPGA implementation of Dilithium-NTT, optimizing both the modular multiplication unit and the memory access scheme. The modular multiplier leverages the data in LUTs and an accelerated K-RED algorithm, achieving high performance with low resource overhead (111 LUTs, 137 FFs, and 02 DSPs at a frequency of 613 MHz). The proposed NTT architecture employs a dual ping-pong memory access scheme, eliminating the need for BRAMs while utilizing LUTs for intermediate data storage. Implemented on the Xilinx Zynq UltraScale+ ZCU104 and Artix-7 AC701 FPGA prototyping platforms, the design achieves a 14% improvement in the area-time product compared to state-of-the-art solutions, with 12% lower LUT usage with an advantage of 6% higher operating frequency. These results demonstrate a scalable and resource-efficient approach for the deployment of PQC primitives in constrained hardware environments.