Skip to content

Latest commit

 

History

History
59 lines (52 loc) · 5.49 KB

File metadata and controls

59 lines (52 loc) · 5.49 KB

GPU Sharing

{% hint style="info" %} I am actively maintaining this list. {% endhint %}

GPU Temporal Sharing

  • Orion: Interference-aware, Fine-grained GPU Sharing for ML Applications (EuroSys 2024) [Personal Notes] [Paper]
    • ETH
    • Intercept GPU kernel launches and schedule individual GPU operators
    • Utilize CUDA stream priorities; consider the PCIe bandwidth
    • Use NVIDIA Nsight Compute and NVIDIA Nsight Systems to collect the compute throughput, memory throughput, and execution time of each kernel
  • Interference-aware Multiplexing for Deep Learning in GPU Clusters: A Middleware Approach (SC 2023) [Personal Notes] [Paper] [Code]
    • UMacau & SIAT, CAS
    • IADeep — a cluster scheduler on top of Kubernetes
    • Tune training configurations (e.g., batch size) across all co-located tasks; choose appropriate tasks to multiplex on a GPU device; consider PCIe bandwidth
  • Transparent GPU Sharing in Container Clouds for Deep Learning Workloads (NSDI 2023) [Paper] [Code]
    • PKU
    • TGS: Transparent GPU sharing; adaptive rate control; unified memory.
  • Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences (OSDI 2022) [Personal Notes] [Paper] [Code] [Benchmark] [Artifact]
    • SJTU
    • REEF: GPU kernel preemption; dynamic kernel padding.
  • Gemini: Enabling Multi-Tenant GPU Sharing Based on Kernel Burst Estimation (TCC 2021) [Paper] [Code]
    • National Tsing Hua University
    • Enable fine-grained GPU allocation; launch kernels together.
  • AntMan: Dynamic Scaling on GPU Clusters for Deep Learning (OSDI 2020) [Paper] [Code]
    • Alibaba
    • Enable GPU sharing in DL frameworks (TensorFlow/PyTorch); schedule operators.
  • KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud (HPDC 2020) [Personal Notes] [Paper] [Code]
    • National Tsing Hua University
    • Kubernetes; CUDA API remoting.
  • GaiaGPU: Sharing GPUs in Container Clouds (ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018) [Personal Notes] [Paper] [Code]
    • PKU & Tencent
    • Kubernetes; CUDA API remoting.

GPU Spatial Sharing

  • Paella: Low-latency Model Serving with Software-defined GPU Scheduling (SOSP 2023) [Paper] [Code]
    • UPenn & DBOS, Inc.
    • Instrument kernels to expose at runtime, detailed information about the occupancy and utilization of the GPU's SMs
    • Compiler-library-scheduler co-design
  • MuxFlow: Efficient and Safe GPU Sharing in Large-Scale Production Deep Learning Clusters (arXiv 2303.13803) [Paper]
    • PKU & ByteDance
    • Utilize NVIDIA MPS
  • NVIDIA Multi-Instance GPU (MIG) [Homepage]
    • Partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores.
    • Available for NVIDIA H100, A100, and A30 GPUs.
  • NVIDIA Multi-Process Service (MPS) [Docs]
    • Transparently enable co-operative multi-process CUDA applications.
    • Terminating an MPS client without synchronizing with all outstanding GPU work (via Ctrl-C / program exception such as segfault / signals, etc.) can leave the MPS server and other MPS clients in an undefined state, which may result in hangs, unexpected failures, or corruptions.
  • NVIDIA CUDA Multi-Stream [Docs]
    • Stream: a sequence of operations that execute in issue-order on the GPU.
    • Perform multiple CUDA operations simultaneously.

Survey