In the paper, authors introduced a new work scheduler to improve both work efficiency and parallelism for the Single Source Shortest Path search.
This paper shows that by using the execution statistics of standalone workloads and the fairness of execution when these workloads are executed with three representative microbenchmarks, we can get a reasonably accurate prediction.
In this paper, authors introduced AntMan, a system to accommodate the fluctuating resource demands of deep learning training jobs.
This paper introduce Gandiva, a new cluster scheduling framework that utilizes domain-specific knowledge to improve latency and efficiency of training deep learning models in a GPU cluster.
This paper configured multiple experiments to explore the rules of the GPU kernel-level scheduling.
The authors proposed BRP-NAS, an efficient hardware-aware NAS enabled by an accurate performance (latency and accuracy) predictor based on graph convolutional network (GCN). The BRP-NAS uses the binary relations of models and an iterative data selection strategy to improve the sample selection. In addition, they also released LatBench - a latency dataset of NAS-Bench-201 models running on abroad range of devices.