Skip to content
Change the repository type filter

All

    Repositories list

    • A Declarative System for Optimizing AI Workloads
      Python
      MIT License
      14000Updated Dec 29, 2024Dec 29, 2024
    • TeaCache

      Public
      Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
      Python
      Apache License 2.0
      9000Updated Dec 27, 2024Dec 27, 2024
    • Tile primitives for speedy kernels
      Cuda
      MIT License
      93000Updated Dec 23, 2024Dec 23, 2024
    • Python
      MIT License
      4000Updated Dec 21, 2024Dec 21, 2024
    • LOTUS: A semantic query engine - process data with LLMs as easily as writing pandas code
      Python
      MIT License
      73000Updated Dec 18, 2024Dec 18, 2024
    • Modyn is a research-platform for training ML models on growing datasets.
      Python
      MIT License
      6000Updated Dec 9, 2024Dec 9, 2024
    • depyf

      Public
      depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
      Python
      MIT License
      15000Updated Dec 7, 2024Dec 7, 2024
    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      Apache License 2.0
      135000Updated Nov 27, 2024Nov 27, 2024
    • GLake: optimizing GPU memory management and IO transmission.
      Python
      Apache License 2.0
      35000Updated Nov 27, 2024Nov 27, 2024
    • Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
      Python
      Apache License 2.0
      3100Updated Nov 9, 2024Nov 9, 2024
    • Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
      HTML
      Apache License 2.0
      1400Updated Nov 8, 2024Nov 8, 2024
    • PhoenixOS

      Public
      Fast OS-level support for GPU checkpoint and restore
      C
      Apache License 2.0
      11000Updated Nov 6, 2024Nov 6, 2024
    • Dynamic resources changes for multi-dimensional parallelism training
      Go
      1000Updated Oct 1, 2024Oct 1, 2024
    • Python
      MIT License
      13000Updated Sep 28, 2024Sep 28, 2024
    • Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)
      Python
      MIT License
      7000Updated Sep 28, 2024Sep 28, 2024
    • ChatDev

      Public
      Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
      Shell
      Apache License 2.0
      3.3k000Updated Sep 6, 2024Sep 6, 2024
    • Nanoflow

      Public
      A throughput-oriented high-performance serving framework for LLMs
      Cuda
      29000Updated Aug 26, 2024Aug 26, 2024
    • Jupyter Notebook
      Apache License 2.0
      6000Updated Aug 21, 2024Aug 21, 2024
    • quokka

      Public
      Making data lake work for time series
      Python
      Apache License 2.0
      59000Updated Aug 21, 2024Aug 21, 2024
    • 16-fold memory access reduction with nearly no loss
      Python
      MIT License
      3000Updated Aug 18, 2024Aug 18, 2024
    • ParvaGPU represents a proficient GPU space-sharing technology that optimizes GPU usage while facilitating large-scale DNN inference in cloud environments, improving cost-effectiveness.
      Python
      MIT License
      1000Updated Aug 16, 2024Aug 16, 2024
    • Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
      C++
      MIT License
      3000Updated Aug 14, 2024Aug 14, 2024
    • A low-latency & high-throughput serving engine for LLMs
      Python
      Apache License 2.0
      35000Updated Aug 5, 2024Aug 5, 2024
    • glake

      Public
      GLake: optimizing GPU memory management and IO transmission.
      Python
      Apache License 2.0
      35000Updated Aug 3, 2024Aug 3, 2024
    • alfworld

      Public
      ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
      Python
      MIT License
      56000Updated Jul 31, 2024Jul 31, 2024
    • [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"
      Python
      MIT License
      73100Updated Jul 30, 2024Jul 30, 2024
    • GPTSwarm

      Public
      🐝 GPTSwarm: LLM agents as (Optimizable) Graphs
      Python
      MIT License
      48000Updated Jul 26, 2024Jul 26, 2024
    • [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache
      Python
      MIT License
      2000Updated Jul 26, 2024Jul 26, 2024
    • [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
      Python
      MIT License
      9100Updated Jul 25, 2024Jul 25, 2024
    • [ECCV 2024] Code for VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
      Python
      Other
      34000Updated Jul 25, 2024Jul 25, 2024