Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
-
Updated
Nov 24, 2025 - Python
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Official Implementation of VideoDPO
code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
ZYN: Zero-Shot Reward Models with Yes-No Questions
Building synthetic data for preference tuning
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
distilled Self-Critique refines the outputs of a LLM with only synthetic data
RewardAnything: Generalizable Principle-Following Reward Models
Code for the paper "Improving Socratic Question Generation using Data Augmentation and Preference Optimization"
RankPO: Rank Preference Optimization
Production-ready RLAIF trading system with multi-agent Claude AI that learns from market outcomes. Features 60+ indicators, foundation models, and serverless deployment.
🧠 Enhance AI conversations with Cognio, a persistent memory server that retains context and enables meaningful semantic search across sessions.
(Stepwise controlled Understanding for Trajectories) -- “agent that learns to hunt"
RLAF: Reinforcement Learning from Agentic Feedback - A unified framework for training AI agents with multi-perspective critic ensembles
🤖 Train AI agents effectively with RLAF, utilizing multi-perspective critic ensembles for richer feedback and improved performance in reinforcement learning.
Add a description, image, and links to the rlaif topic page so that developers can more easily learn about it.
To associate your repository with the rlaif topic, visit your repo's landing page and select "manage topics."