Semantic-Guided Test Generation using Fine-Tuned LLMs for Validation of Hardware Accelerators

Emma Andrews1, Aruna Jayasena2, Prabhat Mishra1
1University of Florida, 2University of Tennessee


Abstract

The increasing complexity and heterogeneity of programmable hardware accelerators, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPU), pose a significant challenge for automated test generation and functional validation. Traditional validation techniques often struggle to scale with architectural diversity and cannot effectively exploit the semantic relationships between instructions and data. Validation using large language models (LLMs) is a promising avenue for generating assembly programs (test vectors) for processor verification since LLMs are trained with diverse general-purpose processor designs. Unfortunately, LLMs are unsuitable for validation of programmable hardware accelerators since there is a lack of training data for such implementations. In this paper, we propose an automated framework that fine-tunes LLMs to generate semantically correct test cases directed toward improved design coverage while monitoring the functional correctness of the outputs. The generated test cases are evaluated by a compiler for correctness before using them for validation of hardware accelerators. We facilitate a mechanism for the LLM to observe the design coverage on the implementation based on the previously generated test patterns. Extensive experimental evaluation demonstrates that our framework can achieve 33% improvement in design coverage compared to state-of-the-art test generation with the added advantage of monitoring the functional correctness of the design. Our framework has identified several functional bugs in the open-source tiny-gpu implementation.