Exploring Performance-Power Tradeoffs in Providing Reliability for NoC-Based MPSoCs

Hui Zhao,  Mahmut Kandemir,  Mary Jane Irwin
Penn State Univ


Abstract

Performance and power consumption are important challenges faced by Network-on-Chip (NoC) designers. The situation is exacerbated when error control techniques are employed to provide reliability, since such techniques can lead to extra power consumption and execution cycles. In many systems today, ECC codes are used for error detection. Once an error is detected, recovery schemes are invoked to correct it. In this paper, we focus on tuning error recovery schemes to explore performance, power and reliability tradeoffs. Previous reliability work targeting NoCs proposed two retransmission techniques to recover from errors: End-to-End retransmission and Hop-by-Hop retransmission. End-to-End retransmission can save power but can also incur longer delays for recovery by checking errors only at the destination. In comparison, Hop-by-Hop retransmission checks for errors at every router and has better performance at the expense of increased power overhead. We propose a novel retransmission scheme that employs feedback control theory to dynamically choose the time for error checking based on the performance requirements of the applications. Our scheme ensures that applications meet performance QoS and save power at the same time. Our experimental evaluation shows that, if a 10% slack in delay is allowed, our scheme can save as much as 80% of the power consumed by the underlying error control scheme.