A Low-overhead Dilithium-NTT Architecture using Accelerated K-RED Modular Reduction Unit

Harsh Gupta1, Aryan Goyal1, Paranjay Dhadwal1, Jugal Gandhi2, Diksha Shekhawat3, Jai Gopal Pandey4
1Birla Institute of Technology and Science (BITS) Pilani Goa Campus, 2AcSIR at CSIR-CEERI, 3AcSIR, CSIR-CEERI, 4CSIR -Central Electronics Engineering Research Institute (CEERI), Pilani, Rajasthan, India


Abstract

The increasing threat posed by quantum computers to classical cryptographic systems necessitates the adoption of post-quantum cryptographic (PQC) algorithms. CRYSTALS-Dilithium, a lattice-based digital signature scheme standardized by NIST, relies heavily on efficient polynomial multiplication using the number theoretic transform (NTT). This work presents a FPGA implementation of Dilithium-NTT, optimizing both the modular multiplication unit and memory access scheme. The modular multiplier leverages precomputed LUTs and an accelerated K-RED algorithm, achieving high performance with low resource overhead (111 LUTs, 137 FFs, and 02 DSPs at a frequency of 613 MHz). The proposed NTT architecture employs a dual ping-pong memory access scheme, eliminating the need for BRAMs while utilizing LUTs for intermediate data storage. Implemented on Xilinx Zynq UltraScale+ ZCU104 and Artix-7 AC701 evaluation platforms, the design achieves a 14% improvement in the area-time product compared to state-of-the-art solutions, with 12% lower LUT usage with an advantage of 6% higher operating frequency. These results demonstrate a scalable and resource-efficient approach for deploying PQC primitives in constrained hardware environments.