Roofline Performance Analysis of DNN Architectures on CPU and GPU Systems

Prashanth H C1 and Madhav Rao2
1IIIT-Bangalore, 2International Institute of Information Technology-Bangalore


Abstract

The significance of prior characterizations of Deep Neural Network (DNN) Architectures on CPU and GPU is likely to benefit in understanding the resources and timing constraints for the large-scale adoption and deployment. The paper focuses on 43 popular DNN workloads profiling on two different segments of CPU and GPUs, each, including Workstation and mobile class. The roof-line performance analysis model aids in picking the optimal architectural configurations for the real-time resource-constrained design with respect to the Arithmetic Intensity of the model run and throughput extracted from the system. The selected DNNs ranging from MobileNet to VGGNet models showcase a wide characteristics in terms of the number of MAC operations per inference, data movement, and parameter size. The average throughput improvement of around 10× was reported for DNNs when configured to run on 48 multiple threads over single threaded execution on CPU Workstation, but no direct improvement of throughput was reported for changes in batch size. The DNN models when run on GPU Workstation showed 98× improvement in throughput for batch size of 16 over size of 1, whereas it remains insensitive to changes for multi-threaded execution.