This paper shows that by using the execution statistics of standalone workloads and the fairness of execution when these workloads are executed with three representative microbenchmarks, we can get a reasonably accurate prediction.
In this paper, authors introduced AntMan, a system to accommodate the fluctuating resource demands of deep learning training jobs.
This paper introduce Gandiva, a new cluster scheduling framework that utilizes domain-specific knowledge to improve latency and efficiency of training deep learning models in a GPU cluster.
This paper configured multiple experiments to explore the rules of the GPU kernel-level scheduling.
The authors proposed BRP-NAS, an efficient hardware-aware NAS enabled by an accurate performance (latency and accuracy) predictor based on graph convolutional network (GCN). The BRP-NAS uses the binary relations of models and an iterative data selection strategy to improve the sample selection. In addition, they also released LatBench - a latency dataset of NAS-Bench-201 models running on abroad range of devices.
This paper introduces InferLine, a system for provisioning and managing each stage of prediction pipelines to meet end-to-end tail latency constraints while minimizing cost.