[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
-
Updated
Jun 10, 2024 - Python
[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
This repository collects research papers on learning from rewards in the context of post-training and test-time scaling of large language models (LLMs).
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
Official PyTorch Implementation for the "RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling" paper!
Proposed fuzzy reward model with GRPO to improve VLM's abilities in crowd counting task.
Developing a LLM response ranking reward model using HFRL except it's GPT-3.5 instead of human.
A reward model to evaluate machine translations, focusing on English-to-Spanish sentence pairs, with applications in natural language processing (NLP), translation quality assessment, and multilingual content adaptation
A curated list of research papers, models, and resources related to R1-style reasoning models following DeepSeek-R1's breakthrough in January 2025.
Fine-tuning FLAN-T5 with PPO and PEFT to generate less toxic text summaries. This notebook leverages Meta AI's hate speech reward model and utilizes RLHF techniques for improved safety.
POC library built on TextRL for easy training and usage of fine-tuned models using RLHF, a rewards model, and PPO
Add a description, image, and links to the reward-model topic page so that developers can more easily learn about it.
To associate your repository with the reward-model topic, visit your repo's landing page and select "manage topics."