Preliminary Program

Ralts: Robust Aggregation for Enhancing Graph Neural Network Resilience on Bit-flip Errors

Wencheng Zou and Nan Wu
George Washington University

Abstract

Graph neural networks (GNNs) have been widely applied in safety-critical applications, such as financial and med- ical networks, in which compromised predictions may have severe consequences. While existing research on GNN robustness has primarily focused on software-level threats, hardware-induced faults and errors remain largely underexplored. As hardware systems progress toward advanced technology nodes to meet high performance and energy efficiency demands, they become increasingly susceptible to transient faults, which can cause bit flips and silent data corruption, a prominent issue observed by major technology companies such as Google, Meta. In response, we propose Ralts, a generalizable and lightweight solution to bolster GNN resilience to bit-flip errors. Specifically, Ralts exploits various graph similarity metrics to filter out outliers and recover compromised graph topology, and incorporates these protective techniques directly into aggregation functions to support any message-passing GNNs. Evaluation results demonstrate that Ralts effectively enhances GNN robustness across a wide range of GNN models, graph datasets, error patterns, and both dense and sparse architectures. Meanwhile, it scales well to denser and larger graphs. Ralts is also optimized to deliver execution efficiency comparable to built-in aggregation functions in PyTorch Geometric.