grpo-training

Here are 10 public repositories matching this topic...

vivoCameraResearch / SmartPhotoCrafter

official github code for "SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing"

Updated May 26, 2026
Python

sespoir / ReGround

Star

[ACM MM 2026] ReGround: Restoring Visual Grounding in Multi-Step Reasoning through Self-Diagnosis and Visual Re-Examination

reinforcement-learning visual-grounding qwen2-5-vl multimodal-reasoning-visual-reasoning grpo-training acm-mm-2026

Updated Jul 19, 2026
Python

winstonsmith1897 / GTPO

Star

Group-relative Trajectory-based Policy Optimization: Increasing Quality and Training Stability

reinforcement-learning reinforcement-learning-algorithms train fine post-training llm rlhf grpo-training

Updated Feb 23, 2026
Jupyter Notebook

Surya-Hariharan / OpenMedRL-openenv

Star

OpenMedRL is an open-source reinforcement learning environment for benchmarking LLM-powered medical agents in emergency care. It simulates triage, dynamic patient progression, resource constraints, and uncertainty-aware clinical decision-making.

medical-ai medical-triage huggingface-transformers huggingface-spaces unsloth openenv grpo-training

Updated Jun 21, 2026
Python

DeepGym / deepgym

Star

RL training environments with verifiable rewards for coding agents. Works with TRL, Unsloth, verl, OpenRLHF.

python machine-learning reinforcement-learning deep-learning sandbox evaluation rl code-execution ai-agents daytona llm unsloth coding-agents grpo verifiable-rewards openrlhf reward-function grpo-training

Updated Apr 24, 2026
Python

Vidit-Ostwal / price-negotiation-rl-OpenEnv

Sponsor

Star

An OpenEnv RL environment where an LLM agent plays the buyer and negotiates against an LLM-powered seller over real marketplace listings.

python machine-learning reinforcement-learning rl rl-environment openenv grpo-training price-negotiator openenv-environment

Updated May 9, 2026
Python

injamul3798 / LLM-Fine-tuning-RL-Hands-on-Lab-code-Intro-to-Post-training

Star

This repository contains my personal notes and hands-on implementations for fine-tuning and post-training Large Language Models (LLMs).

reinforcement-learning post-training ppo finetuning-llms grpo-training

Updated May 1, 2026
Jupyter Notebook

lakshyabuilds / Affect-GRPO

Star

Low-cost GRPO fine-tuning pipeline for valence-arousal state estimation and tone-aligned responses in Llama 3.2 1B.

emotional-intelligence supervised-machine-learning finetuning self-awareness emotion-classifier large-language-models llm supervised-finetuning finetuning-llms finetuning-large-language-models llama3-2 grpo grpotrainer self-aware-ai grpo-training

Updated Jul 14, 2026
Jupyter Notebook

safoura-banihashemi / qwen3-terminal-grpo

Star

A reinforcement learning fine-tuned model that generates Linux terminal commands from natural language descriptions. Trained using GRPO (Group Relative Policy Optimization) on a custom terminal task environment inspired by CAMEL-AI's SETA framework.

lora fine-tuning huggingface grpo-training

Updated May 17, 2026
Jupyter Notebook

sauradip / ABACUS

Star

[SIGGRAPH Asia (TOG) 2026] Official Implementation of "ABACUS"

reinforcement-learning object-counting diffusion-models agentic-ai grpo-training unified-foundation-model

Updated Jul 9, 2026
Python

Improve this page

Add a description, image, and links to the grpo-training topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the grpo-training topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grpo-training

Here are 10 public repositories matching this topic...

vivoCameraResearch / SmartPhotoCrafter

sespoir / ReGround

winstonsmith1897 / GTPO

Surya-Hariharan / OpenMedRL-openenv

DeepGym / deepgym

Vidit-Ostwal / price-negotiation-rl-OpenEnv

injamul3798 / LLM-Fine-tuning-RL-Hands-on-Lab-code-Intro-to-Post-training

lakshyabuilds / Affect-GRPO

safoura-banihashemi / qwen3-terminal-grpo

sauradip / ABACUS

Improve this page

Add this topic to your repo