[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
-
Updated
Jun 10, 2024 - Python
[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
A curated list of papers on reinforcement learning for video generation
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.
Proposed fuzzy reward model with GRPO to improve VLM's abilities in crowd counting task.
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
Official PyTorch Implementation for the "RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling" paper!
Code for ICML 2025 paper "GRAM: A Generative Foundation Reward Model for Reward Generalization"
ACL'25: Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Developing a LLM response ranking reward model using HFRL except it's GPT-3.5 instead of human.
A curated list of research papers, models, and resources related to R1-style reasoning models following DeepSeek-R1's breakthrough in January 2025.
A reward model to evaluate machine translations, focusing on English-to-Spanish sentence pairs, with applications in natural language processing (NLP), translation quality assessment, and multilingual content adaptation
Fine-tuning FLAN-T5 with PPO and PEFT to generate less toxic text summaries. This notebook leverages Meta AI's hate speech reward model and utilizes RLHF techniques for improved safety.
POC library built on TextRL for easy training and usage of fine-tuned models using RLHF, a rewards model, and PPO
Add a description, image, and links to the reward-model topic page so that developers can more easily learn about it.
To associate your repository with the reward-model topic, visit your repo's landing page and select "manage topics."