SOSP 2024

Meta Info

Improving DNN Inference Throughput using Practical, Per-Input Compute Adaptation
- GaTech & Princeton
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving [arXiv]
- Princeton & GaTech
- Automatically apply and manage early exits (certain inputs can exit with results at intermediate layers) in ML models.

SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures [arXiv]
- Stanford
- Dynamically re-route the work of a failed server to data-parallel peers; execute within bubbles of the original pipeline schedule.
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections [arXiv]
- ICL
- Tenplex — a state management library.
  - Enable jobs to change the parallelism dynamically.
  - PTC: Parallelizable Tensor Collection
    - Dataset state
    - Modle state
  - Execute PTC transformations in parallel with minimum data movement between workers.

Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor [arXiv]
- UIUC & MSRA
- T10, the first DL compiler to exploit the inter-core communication bandwidth and distributed on-chip memory on AI chips (i.e., Graphcore IPU).
SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree Inference
- IISc

Dirigent: Lightweight Serverless Orchestration [arXiv]
- ETH
- Simplify state management of the existing orchestration system (Kubernetes); eliminate persistent state updates; run monolithic control and data planes to minimize internal communication overheads.