rlvr
Here are 21 public repositories matching this topic...
[EMNLP'25] s3 - ⚡ Efficient & Effective Search Agent Training via RL for RAG (Verifier-Powered RLVR for Search)
-
Updated
Aug 3, 2025 - Python
An Awesome List of Agentic Model trained with Reinforcement Learning
-
Updated
Sep 19, 2025 - HTML
Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934
-
Updated
Sep 19, 2025 - Python
A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more intelligent and aligned AI agents.
-
Updated
Sep 1, 2025
🐝 SwarmBench: Benchmarking LLMs' Swarm Intelligence
-
Updated
May 21, 2025 - Python
grpo to train long form QA and instructions with long-form reward model
-
Updated
Jul 17, 2025 - Python
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
-
Updated
Jul 6, 2025 - Python
A curated collection of research papers on LLM Tool-Integrated Reasoning (TIR), where LLMs enhance reasoning by interacting with external tools such as calculators, search engines, and code interpreters.
-
Updated
Aug 20, 2025
CAP RLVR: Reinforcement Learning from Human Feedback for Legal Reasoning using Caselaw Access Project data. Complete GRPO training pipeline with OpenAI Gym environments, deterministic reward functions, and multi-stage curriculum learning for legal LLM development.
-
Updated
Jul 27, 2025 - Python
Finetuning of Qwen3 0.6B for MCQA tasks
-
Updated
Sep 4, 2025 - Python
RLHF and Verifiable Reward Models - Post training Research
-
Updated
Apr 28, 2025 - Python
A curated list of papers on implicit-reward reinforcement learning for LLMs — no human feedback, no gold answers, no verifiable rewards.
-
Updated
May 30, 2025
Profile for Jonathan Rahn — AI Lab Lead at Drees & Sommer. Chess-reasoning LMs (policy + world model), RL with verifiable rewards.
-
Updated
Sep 19, 2025
-
Updated
May 16, 2025
A curated collection of papers combining Self-Supervised Learning (SSL) with Reinforcement Learning (RL) in the context of Large Language Models (LLMs), toward autonomous agents in the Era of Experience. Inspired by “Welcome to the Era of Experience” (Silver & Sutton, 2025).
-
Updated
Jun 23, 2025
Improve this page
Add a description, image, and links to the rlvr topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the rlvr topic, visit your repo's landing page and select "manage topics."