This study introduces a novel approach to provide an illusion of virtual memory space for GPUs, called hierarchical unified virtual memory (HUVM), by incorporating the temporarily idle memory of neighbor GPUs. (using NVLink)
This work presents NVBit, a fast, dynamic, and portable, binary instrumentation framework.
In this paper, authors introduced a new work scheduler to improve both work efficiency and parallelism for the Single Source Shortest Path search.
This paper shows that by using the execution statistics of standalone workloads and the fairness of execution when these workloads are executed with three representative microbenchmarks, we can get a reasonably accurate prediction.
In this paper, authors introduced AntMan, a system to accommodate the fluctuating resource demands of deep learning training jobs.
This paper introduce Gandiva, a new cluster scheduling framework that utilizes domain-specific knowledge to improve latency and efficiency of training deep learning models in a GPU cluster.