VAE-Enabled Design Space Exploration for Heterogeneous Approximate Matrix Multiplication Accelerators

Niranjan Gopal1, Nishith Akula2, Madhav Rao3
1International Institute of Information Technology Bangalore, 2International Institute of Information Technology, Bangalore, 3International Institute of Information Technology-Bangalore


Abstract

Matrix multiplication underpins compute-intensive workloads from image processing to deep learning. This work presents a novel hardware accelerator combining Strassen's algorithm, systolic arrays, and approximate computing to enhance area and power efficiency. At Strassen's leaf recursion level, conventional exact multipliers (EMs) are selectively replaced with approximate multipliers (AMs), while time-shared EMs preserve accuracy in critical paths. The accelerator employs multiple systolic arrays with many processing elements (PEs), where each PE selects from ten AMs creating a vast discrete design space. To navigate this complexity, we propose a Variational Autoencoder (VAE) that compresses PE configurations into a compact latent space. A multi-objective evolutionary algorithm (MOEA) then optimizes hardware metrics (area, power, delay) alongside application-level performance (SSIM, PSNR, classification accuracy). Our design maintains numerical stability across recursive layers, ensuring Strassen's asymptotic advantage over conventional multiplication. MOEA-driven exploration identifies designs reducing hardware footprint by 33% and improving power-delay product by 21%, with only 0.9% CNN accuracy loss. The VAE enables rapid exploration by decoding latent codes to full configurations, generating optimized Pareto frontiers. This framework demonstrates effective hardware-application co-optimization, delivering efficient matrix multiplication for approximate computing workloads.