danshi777

Follow

danshi777

Follow

1 follower · 0 following

Popular repositories Loading

CORECODE CORECODE Public

3
IRCAN IRCAN Public

Python 3
UltraFeedback UltraFeedback Public

Forked from OpenBMB/UltraFeedback

A large-scale, fine-grained, diverse preference dataset (and models).

Python
dpo_toxic dpo_toxic Public

Forked from ajyl/dpo_toxic

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.

Jupyter Notebook
trl trl Public

Forked from huggingface/trl

Train transformer language models with reinforcement learning.

Python
direct-preference-optimization direct-preference-optimization Public

Forked from eric-mitchell/direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Python