GPU-Accelerated Sparse Matrix-Vector Multiplication For Fast Transient Thermal Analysis

Kai He1,  Tan Yu1,  Sheldon Tan1,  Hai Wang2,  He Tang2
1University of California, Riverside, 2UESTC


n this paper, we propose a new fast parallel sparse matrix-vector multiplication (SpMV) algorithm on GPU platforms and demonstrate its application for fast transient thermal analysis based on the explicit integration scheme. The new algorithm, called segSpMV, is based on the compressed sparse row (CSR) format and can be applied to wide computational applications with both structured and unstructured matrices. The SpMV operation has very low computing to communication ratio and is bandwidth-limited. The new SpMV algorithm tries to reduce the memory access by partitioning the rows, whose nonzero patterns are irregular in general, into a number of fixed-length segments. As a result, both multiplication and summation phases now can enjoy the coalesced memory access and they can be finished in one kernel launch. The summation phase can also be further improved by using GPU reduction techniques for large segment lengths. The resulting SpMV method constantly outperforms all published algorithms based on a set of public matrix benchmarks. We demonstrate its advantage on a fast transient thermal analysis for integrated systems based on an explicit finite difference method where only SpMV operation is required. The resulting finite difference based thermal analysis method can lead to order of magnitude speedup over CPU based implicit methods with linear time scalability.