Skip to content

Latest commit

 

History

History
50 lines (34 loc) · 1.82 KB

isca-2024.md

File metadata and controls

50 lines (34 loc) · 1.82 KB

ISCA 2024

Meta Info

Homepage: https://iscaconf.org/isca2024/

Paper list: https://www.iscaconf.org/isca2024/program/

Papers

Large Language Models (LLMs)

  • Splitwise: Efficient Generative LLM Inference Using Phase Splitting
    • Microsoft
    • Best Paper Award
  • MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition
  • Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
  • LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference
  • ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

Mixture-of-Experts (MoEs)

  • Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
    • MSRA
    • Pre-gating function to alleviate the dynamic nature of sparse expert activation. -> Address the large memory footprint.

Recommendation Models

  • Heterogeneous Acceleration Pipeline for Recommendation System Training [arXiv]
    • UBC & GaTech
    • Hotline: a runtime framework.
    • Utilize CPU main memory for non-popular embeddings and GPUs’ HBM for popular embeddings.
    • Fragment a mini-batch into popular and non-popular micro-batches (μ-batches).

Diffusion Models

  • Cambricon-D: Full-Network Differential Acceleration for Diffusion Models
    • ICT, CAS
    • The first processor design to address Diffusion Model acceleration.
    • Mitigate additional memory accesses, while maintaining the concise computation from differential computing.

Video Analytics

  • DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

Accelerators

  • Intel Accelerator Ecosystem: An SoC-Oriented Perspective
    • Intel
    • Industry Session