Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Starred repositories
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
A bibliography and survey of the papers surrounding o1
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
A sparse attention kernel supporting mix sparse patterns
FlashInfer: Kernel Library for LLM Serving
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Run PyTorch LLMs locally on servers, desktop and mobile
Helpful tools and examples for working with flex-attention
This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
[NeurIPS 2024] SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challen…
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
Code examples and resources for DBRX, a large language model developed by Databricks