Contents

ATC'21 | Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training

CODE & VIDEO

Abstract

Habitat is a Python library that can predict performance on GPUs with the help of a GPU that the user already has.

Current Apporaches and their Limitation

The approaches for the DL performance analysis today include

  1. Directly measuring the training job on the GPU
  2. Using the benchmark. However, there are the limitations of these approaches:
    1. You need to have the GPU in the first place
    2. They are not as helpful in a custom DNN on a specific GPU. Another approach is to use heuristics, which assumes that DNN training workload exhaust all the computational resources on a GPU, which is not true in general.

Observations

Hence, habitat based on the observations:

  1. Repetitive computation
  2. Common building blocks among DNNs
  3. Runtime information available, make predictions by scaling the execution time of each operation in a training iteration from one GPU to another using either (i) wave scaling and (ii) pre-trained MLPs.

Habitat’s Approaches

Wave Scaling

Wave scaling is based on the fact that the GPU kernel is partitioned into groups of threads called thread blocks, which typically execute in concurrent groups, resulting in waves of execution. The key idea behind wave scaling is to compute the number of thread block waves in a kernel and scale the wave execution time using ratios between the origin and destination GPU. The ratios consider the thread block number, GPUs memory bandwidth, clock frequency. If the operation is compute-bound, the clock frequency weighs more, memory bandwidth weighs more otherwise.

Pre-trained MLPs

However, some operations are typically implemented using proprietary kernel libraries that leverage GPU architecture-specific kernels. Wave scaling is not suitable for these operations. Therefore, habitat uses pre-trained MLPs to predict these operations. Each operation-specific MLP takes as input:

  1. layer dimensions
  2. the memory capacity and bandwidth on the target GPU
  3. the number of SMs on the target GPU
  4. the peak FLOPS of the target GPU

And the output of MLP is the predicted time.

Innovation

Authors proposed habitat to use one device to predict the DNN performance on other devices, which is a new scenario.