Large Model
CivRealm is an interactive environment for the open-source strategy game Freeciv-web based on Freeciv, a Civilization-inspired game.
Implementation of "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents"
Code for the paper Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance, accepted to CoRL 2023 as an Oral Presentation.
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
Official code for VisProg (CVPR 2023 Best Paper!)
[NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents
[ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"
Code and example data for the paper: Rule Based Rewards for Language Model Safety
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
Code release for paper "Autonomous Improvement of Instruction Following Skills via Foundation Models" | CoRL 2024
Code/data for MARG (multi-agent review generation)
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
Fast & Simple repository for pre-training and fine-tuning T5-style models
Fine tune a T5 transformer model using PyTorch & Transformers🤗
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A curated list of reinforcement learning with human feedback resources (continually updated)
Train transformer language models with reinforcement learning.
[NeurIPS'24] Grammar-Aligned Decoding: An algorithm to constrain LLMs' outputs without distorting its original distribution
Secrets of RLHF in Large Language Models Part I: PPO
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)