ATC'22 | Memory Harvesting in Multi-GPU Systems with Hierarchical Unified Virtual Memory

Weile Luo published on 2023-07-01 included in Paper Notes

This study introduces a novel approach to provide an illusion of virtual memory space for GPUs, called hierarchical unified virtual memory (HUVM), by incorporating the temporarily idle memory of neighbor GPUs. (using NVLink)

MICRO'21 | NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs

Weile Luo published on 2023-04-14 included in Paper Notes

This work presents NVBit, a fast, dynamic, and portable, binary instrumentation framework.

PPoPP'21 | A Fast Work-Efficient SSSP Algorithm for GPUs

Weile Luo published on 2022-11-14 included in Paper Notes

In this paper, authors introduced a new work scheduler to improve both work efficiency and parallelism for the Single Source Shortest Path search.

TACO'22 | Performance and Power Prediction for Concurrent Execution on GPUs

Weile Luo published on 2022-06-17 included in Paper Notes

This paper shows that by using the execution statistics of standalone workloads and the fairness of execution when these workloads are executed with three representative microbenchmarks, we can get a reasonably accurate prediction.

OSDI'20 | AntMan: Dynamic Scaling on GPU Clusters for Deep Learning

Weile Luo published on 2022-01-12 included in Paper Notes

In this paper, authors introduced AntMan, a system to accommodate the fluctuating resource demands of deep learning training jobs.

OSDI'18 | Gandiva: Introspective Cluster Scheduling for Deep Learning

Weile Luo published on 2022-01-12 included in Paper Notes

This paper introduce Gandiva, a new cluster scheduling framework that utilizes domain-specific knowledge to improve latency and efficiency of training deep learning models in a GPU cluster.