π London
- β±οΈ Sokoban Speedrun - Teach Qwen3 Sokoban. The fastest recipe wins.
- π― Target Policy Optimization - Turn GRPO into supervised learning
- π RamenGPT - Training GPT with a single GPU
- π€ Agentic Uncertainty - Measuring SWE agent uncertainty
- ποΈ ReasoningGym - 100+ RL environments for LLM RLVR
- π Sagaland - AI Interactive Fiction
- π¬ PySpur - A visual playground for agentic workflows
- ποΈββοΈ No Train No Gain - Training BERT and T5 models
- π§ SIN - Causal inference with embedded treatments
- βοΈ LAWA - LAtest Weight Averaging
- πͺ WASAM - Weight-Averaged Sharpness-Aware Minimization



