Stars
supporting pytorch FSDP for optimizers
EleutherAI / nanoGPT-mup
Forked from karpathy/nanoGPTThe simplest, fastest repository for training/finetuning medium-sized GPTs.
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
A PyTorch native platform for training generative AI models
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
A collection of tricks and tools to speed up transformer models
Official implementation of ACL 2025 Findings paper "Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts" (As Huggingface Daily Papers: https://huggingface.co/pape…
Minimalistic large language model 3D-parallelism training
data cleaning and curation for unstructured text
Checkpointable dataset utilities for foundation model training
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
YaRN: Efficient Context Window Extension of Large Language Models
Python 3.8+ toolbox for submitting jobs to Slurm
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Erasing concepts from neural representations with provable guarantees
Landmark Attention: Random-Access Infinite Context Length for Transformers