Optimal Reliability-Constrained Overdrive Frequency Selection In Multicore Systems

Andrew B. Kahng and Siddhartha Nath
University of California, San Diego


Abstract

In leading-edge process technologies, reliability is a first-class constraint for both IC design and system operation. For multicore systems, reliability affects task scheduling decisions since it constrains both performance and throughput. Previous works on reliability-constrained task scheduling have two basic limitations: either they cannot guarantee lifetime (e.g., that the chip can deliver useful performance over 10 years), or they cannot guarantee lower bounds on “acceptable performance” or “acceptable throughput” for the entire chip lifetime. In this work, we formulate and solve a new maximum-value, reliability-constrained overdrive frequencies (MVRCOF) problem that guarantees prescribed lower bounds on “acceptable performance” and “acceptable throughput” in multicore systems, without exceeding prescribed lifetime budget for any core. Our formulation maximizes value of overdrive frequencies for each number of active cores. We develop a solver for the MVRCOF problem and present optimal and heuristic solutions that determine the execution times of each core in each combination of simultaneously active cores, such that cores wear out in a balanced manner over the chip lifetime. These solutions deliver maximum value within the specified chip lifetime, and can be used for reliability-constrained scheduling policies. Our heuristic method can be 3.3% worse than our optimal method, but can converge up to 10× faster. Further, our solutions improve the objective function value by between 2.2% and 17.4% when compared to existing reliability-constrained task scheduling policies that provide lifetime guarantees.