sh-jj

🍖

Meat!

Jie-Jing Shao sh-jj

🍖

Meat!

21 followers · 22 following

Nanjing University
www.lamda.nju.edu.cn/shaojj

Achievements

Lists (14)

Sort

Stars

yih301 / LLM_Formal_Travel_Planner

Python 2 Updated Mar 4, 2025

PeterGriffinJin / Search-R1

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 887 53 Updated Mar 4, 2025

xiongsiheng / SWAP

Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model

Python 34 1 Updated Feb 20, 2025

yongchao98 / CodeSteer-v1.0

Code and dataset of CodeSteer

Python 54 6 Updated Feb 16, 2025

facebookresearch / MRQ

MR.Q is a general-purpose model-free reinforcement learning algorithm.

Python 71 1 Updated Feb 5, 2025

ashutoshtiwari13 / RL-Symbolic-Feedback

Enhancing reasoning capabilities of LLMs by fine-tuning based algorithm using Symbolic AI feedback component

Jupyter Notebook 1 Updated Oct 8, 2024

zorazrw / agent-workflow-memory

AWM: Agent Workflow Memory

Python 252 24 Updated Jan 31, 2025

HarderThenHarder / RLLoggingBoard

A visuailzation tool to make deep understaning and easier debugging for RLHF training.

Python 162 6 Updated Feb 20, 2025

huggingface / search-and-learn

Recipes to scale inference-time compute of open models

Python 1,034 104 Updated Feb 25, 2025

zhentingqi / rStar

Python 905 105 Updated Jan 23, 2025

THUDM / ReST-MCTS

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 587 44 Updated Jan 20, 2025

google / neural-logic-machines

Implementation for the Neural Logic Machines (NLM).

Python 283 45 Updated May 13, 2019

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 5,462 537 Updated Mar 10, 2025

OpenLMLab / MOSS-RLHF

Secrets of RLHF in Large Language Models Part I: PPO

Python 1,326 97 Updated Mar 3, 2024

ebmoon / transformers-GAD

[NeurIPS'24] Grammar-Aligned Decoding: An algorithm to constrain LLMs' outputs without distorting its original distribution

Python 14 4 Updated Feb 10, 2025

huggingface / trl

Train transformer language models with reinforcement learning.

Python 12,367 1,667 Updated Mar 7, 2025

opendilab / awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

3,786 231 Updated Feb 19, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 43,614 5,337 Updated Mar 10, 2025

CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Python 4,595 480 Updated Jan 8, 2024

google / BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Python 2,986 599 Updated Jul 19, 2024

PiotrNawrot / nanoT5

Fast & Simple repository for pre-training and fine-tuning T5-style models

Python 997 76 Updated Aug 21, 2024

Shivanandroy / T5-Finetuning-PyTorch

Fine tune a T5 transformer model using PyTorch & Transformers🤗

Jupyter Notebook 209 34 Updated Feb 10, 2021

google-research / t5x

Python 2,763 310 Updated Mar 6, 2025

google-research / text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Python 6,289 763 Updated Feb 27, 2025

princeton-nlp / tree-of-thought-llm

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Python 5,136 492 Updated Jan 16, 2025

concepts-ai / Concepts

A Concept-Centric Framework for Intelligent Agents

C++ 12 Updated Mar 7, 2025

dwjshift / IL_ADS

code for the paper Imitation Learning from Observation with Automatic Discount Scheduling

Python 13 1 Updated Mar 27, 2024

THUDM / Android-Lab

Python 194 10 Updated Nov 22, 2024

seohongpark / ogbench

A benchmark for offline goal-conditioned RL and offline RL

Python 132 27 Updated Mar 2, 2025

allenai / marg-reviewer

Code/data for MARG (multi-agent review generation)

Python 40 4 Updated Nov 14, 2024

Jie-Jing Shao sh-jj

Lists (14)

Active Learning

Causality

Datasets

Imbalanced Learning

Invariant Learning

Large Model

Neuro-Symbolic Learning

Open-Set Learning

PU Learning

Reinforcement Learning

Stable Diffusion

Stream Learning

Transfer Learning

XAI

Stars