Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multi-lingual Neural Machine Translation

Mukul Lokhande1, Tanushree Dewangan1, Sharik Mansoori2, Tejas Chaudhari3, Akarsh J.1, Damayanti Lokhande4, Adam Teman5, Santosh Vishvakarma6
1Indian Institute of Technology Indore, 2Undergraduate, 3Indian Institute of Technology, Indore, 4Independent, 5Bar-Ilan University, 6IIT Indore


Abstract

This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results indicating a 4.1× reduction in model size (FP4) and a 4.2× speedup in inference speed, which correlates with an increased throughput of 66 tokens/s (improvement by 4.8×). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low-resource linguistic contexts. The FPGA deployment demonstrated a 1.96× reduction in LUTs, a 1.65× decrease in FFs, resulting in a 2.2× enhancement in throughput compared to OPU and 4.6× compared to HPTA. Overall, the evaluation provides a viable solution based on quantization-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire codes and dataset for reproducibility are publicly available, facilitating rapid integration and further development by researchers.