Skip to content

Latest commit

 

History

History
59 lines (46 loc) · 2.93 KB

sosp-2024.md

File metadata and controls

59 lines (46 loc) · 2.93 KB

SOSP 2024

Meta Info

Homepage: https://sigops.org/s/conferences/sosp/2024/

Papers

Large Language Models (LLMs)

  • LLM Training
    • Enabling Parallelism Hot Switching for Efficient Training of Large Language Models
      • PKU
    • Perseus: Removing Energy Bloat from Large Model Training [arXiv]
      • UMich
      • Use a graph cut-based algorithm to obtain the "iteration time-energy" Pareto frontier; schedule the energy consumption across time.
  • LLM Inference
    • LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism [arXiv]
      • PKU
        • ESP: Elastic Sequence Parallelism
        • Elastically adjust the degree of parallelism in real-time; reduce key-value cache migration overhead and overlap partial decoding communication with computation; reduce key-value cache fragmentation across instances.
    • PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU [arXiv]
      • SJTU IPADS

ML Serving

  • Improving DNN Inference Throughput using Practical, Per-Input Compute Adaptation
    • GaTech & Princeton
  • Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving [arXiv]
    • Princeton & GaTech
    • Automatically apply and manage early exits (certain inputs can exit with results at intermediate layers) in ML models.

Distributed Training

  • SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures [arXiv]
    • Stanford
    • Dynamically re-route the work of a failed server to data-parallel peers; execute within bubbles of the original pipeline schedule.
  • Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections [arXiv]
    • ICL
    • Tenplex — a state management library.
      • Enable jobs to change the parallelism dynamically.
      • PTC: Parallelizable Tensor Collection
        • Dataset state
        • Modle state
      • Execute PTC transformations in parallel with minimum data movement between workers.

ML Compilation

  • Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor [arXiv]
    • UIUC & MSRA
    • T10, the first DL compiler to exploit the inter-core communication bandwidth and distributed on-chip memory on AI chips (i.e., Graphcore IPU).
  • SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree Inference
    • IISc

Serverless Computing

  • Dirigent: Lightweight Serverless Orchestration [arXiv]
    • ETH
    • Simplify state management of the existing orchestration system (Kubernetes); eliminate persistent state updates; run monolithic control and data planes to minimize internal communication overheads.