Skip to content
View sh-jj's full-sized avatar
🍖
Meat!
🍖
Meat!

Block or report sh-jj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
441 results for source starred repositories
Clear filter
Python 3 Updated Mar 4, 2025

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 890 53 Updated Mar 4, 2025

Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model

Python 34 1 Updated Feb 20, 2025

Code and dataset of CodeSteer

Python 54 6 Updated Feb 16, 2025

MR.Q is a general-purpose model-free reinforcement learning algorithm.

Python 71 1 Updated Feb 5, 2025

Enhancing reasoning capabilities of LLMs by fine-tuning based algorithm using Symbolic AI feedback component

Jupyter Notebook 1 Updated Oct 8, 2024

AWM: Agent Workflow Memory

Python 252 24 Updated Jan 31, 2025

A visuailzation tool to make deep understaning and easier debugging for RLHF training.

Python 162 6 Updated Feb 20, 2025

Recipes to scale inference-time compute of open models

Python 1,035 104 Updated Feb 25, 2025
Python 905 105 Updated Jan 23, 2025

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 587 44 Updated Jan 20, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 5,469 537 Updated Mar 10, 2025

Secrets of RLHF in Large Language Models Part I: PPO

Python 1,327 97 Updated Mar 3, 2024

[NeurIPS'24] Grammar-Aligned Decoding: An algorithm to constrain LLMs' outputs without distorting its original distribution

Python 14 4 Updated Feb 10, 2025

Train transformer language models with reinforcement learning.

Python 12,378 1,669 Updated Mar 7, 2025

A curated list of reinforcement learning with human feedback resources (continually updated)

3,788 231 Updated Feb 19, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 43,629 5,337 Updated Mar 10, 2025

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Python 4,595 480 Updated Jan 8, 2024

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Python 2,986 599 Updated Jul 19, 2024

Fast & Simple repository for pre-training and fine-tuning T5-style models

Python 997 76 Updated Aug 21, 2024

Fine tune a T5 transformer model using PyTorch & Transformers🤗

Jupyter Notebook 209 34 Updated Feb 10, 2021
Python 2,763 310 Updated Mar 6, 2025

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Python 6,289 763 Updated Feb 27, 2025

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Python 5,136 492 Updated Jan 16, 2025

A Concept-Centric Framework for Intelligent Agents

C++ 12 Updated Mar 7, 2025

code for the paper Imitation Learning from Observation with Automatic Discount Scheduling

Python 13 1 Updated Mar 27, 2024
Python 194 10 Updated Nov 22, 2024

A benchmark for offline goal-conditioned RL and offline RL

Python 132 27 Updated Mar 2, 2025

Code/data for MARG (multi-agent review generation)

Python 40 4 Updated Nov 14, 2024

Convert a PDDL domain into an OpenAI Gym environment.

PDDL 227 62 Updated Oct 23, 2024
Next