llm-alignment

Here are 20 public repositories matching this topic...

walkinglabs / hands-on-modern-rl

🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems.

agent tutorial pytorch dpo reinforcemen llm rlhf agentic agentic-ai grpo llm-alignment agentic-rl

Updated May 11, 2026
Python

glorgao / SelectiveDPO

Star

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

llm-alignment

Updated Jul 16, 2025
Python

LLMSystems / BehaviorRL-Hallucination

Star

Learning When to Answer: Behavior-Oriented Reinforcement Learning for Hallucination Mitigation

entropy uncertainty ai-safety hallucination dpo llm llm-evaluation hallucination-mitigation grpo llm-alignment

Updated Apr 8, 2026
Python

ZZZ150751 / cs336_spring2025_assignment5

Star

CS336 作业 5：基于 Qwen2.5 模型的 LLM 对齐与推理强化学习。完整实现了监督微调（SFT）与组相对策略优化（GRPO）算法，并在 GSM8K 数据集上完成零样本、在策与离策的训练与评估对比。

reinforcement-learning cs336 dpo llm rlhf gsm8k grpo llm-alignment

Updated Apr 1, 2026
Python

lyj20071013 / DZ-TiDPO

Star

Official implementation of "DZ-TiDPO: Non-Destructive Temporal Alignment for Mutable State Tracking". SOTA on Multi-Session Chat with negligible alignment tax.

python nlp dpo rlhf state-tracking qwen phi-3 llm-alignment

Updated Apr 10, 2026
Python

yarakyrychenko / c3ai

Star

C3AI: Crafting and Evaluating Constitutions for CAI

constitutional-ai llm-alignment

Updated Apr 30, 2025
Python

rhaldarpurdue / KLDO

Star

Kullback–Leibler divergence Optimizer based on the Neurips25 paper "LLM Safety Alignment is Divergence Estimation in Disguise".

llm-training llm-alignment

Updated Nov 24, 2025
Python

hanzhenzhujene / student-teacher-phronesis

Star

Teacher-guided prompt-shape discovery for auditable moral attention in frozen weak classifiers.

ai-safety prompt-engineering llm-alignment moral-reasoning

Updated May 11, 2026
Python

Iamyulx / behavior-controlled-rlhf

Star

A training-time alignment framework that integrates safety constraints directly into the RLHF loop — achieving full safety convergence in 7 epochs

nlp reinforcement-learning pytorch behavior-control rlhf reward-model llm-alignment training-time-alignment

Updated Apr 15, 2026
Python

contactvaibhavi / GVR-Bench

Star

Pipeline to investigate structured reasoning and instruction adherence in Vision-Language Models

benchmark robustness grounding out-of-distribution neuro-symbolic robustness-verification instruction-following trustworthy-ai large-language-models faithfulness hallucination-detection agentic-ai llm-alignment agentic-evaluation agentic-reasoning deterministic-eval

Updated Feb 5, 2026
Python

Iamyulx / rlhf-mini-implementation

Star

This project implements a minimal Reinforcement Learning from Human Feedback (RLHF) pipeline using PyTorch.

nlp reinforcement-learning deep-learning pytorch language-model preference-learning ppo human-feedback rlhf reward-model llm-alignment

Updated Apr 9, 2026
Python

JFan5 / mini-grpo

Star

🧠 Minimal, hackable Group Relative Policy Optimization (GRPO) for LLM alignment — the algorithm behind DeepSeek-R1. Train reasoning models on a single GPU.

machine-learning reinforcement-learning pytorch language-model fine-tuning single-gpu rlhf deepseek-r1 grpo llm-alignment

Updated Mar 30, 2026
Python

ny1031 / llm-alignment-practice

Star

LLM Post-training(SFT, RLVR, RLHF) 파이프라인 구축 및 평가 실습 아카이브

nemo post-training sft trl rlhf llm-training llm-evaluation llm-alignment rlvr

Updated Mar 22, 2026
Python

Ekstrem / dialectic-alignment-dataset

Star

обучение диалектическому мышлению в сложных социально-политических контекстах

fine-tuning cognitive-bias mixture-of-experts dialectic synthetic-dataset rlhf constitutional-ai llm-alignment dpo-dataset aesopian-language critical-psychology marxist-analysis

Updated Apr 24, 2026
Python

KID-22 / LLM-SBM

Star

SIGIR 2025 "Mitigating Source Bias with LLM Alignment"

information-retrieval fairness cocktail trustworthy dense-retrieval source-bias llm-alignment

Updated Apr 28, 2025
Python

lankamar / pragmatic-llm-alignment

Star

Investigación sobre alineación pragmática de LLMs y Framework de Agentes LANKAMAR. DOI: 10.5281/zenodo.18904437

nlp research alignment arxiv ai-agents pragmatics independent-research large-language-models llm rlhf llm-alignment lankamar

Updated Mar 7, 2026
Python

JFan5 / rl-arena

Star

🏟️ Modern RL algorithms from scratch — from Q-Learning to GRPO — with clean PyTorch code and interactive notebooks. Compare PPO vs DPO vs GRPO for LLM alignment.

machine-learning reinforcement-learning jupyter-notebook deep-reinforcement-learning pytorch ppo dpo rlhf grpo llm-alignment