Lists (14)
Sort Name ascending (A-Z)
Stars
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model
MR.Q is a general-purpose model-free reinforcement learning algorithm.
Enhancing reasoning capabilities of LLMs by fine-tuning based algorithm using Symbolic AI feedback component
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
Recipes to scale inference-time compute of open models
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
Implementation for the Neural Logic Machines (NLM).
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Secrets of RLHF in Large Language Models Part I: PPO
[NeurIPS'24] Grammar-Aligned Decoding: An algorithm to constrain LLMs' outputs without distorting its original distribution
Train transformer language models with reinforcement learning.
A curated list of reinforcement learning with human feedback resources (continually updated)
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
Fast & Simple repository for pre-training and fine-tuning T5-style models
Fine tune a T5 transformer model using PyTorch & Transformers🤗
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
A Concept-Centric Framework for Intelligent Agents
code for the paper Imitation Learning from Observation with Automatic Discount Scheduling
A benchmark for offline goal-conditioned RL and offline RL
Code/data for MARG (multi-agent review generation)