Skip to content

Will this book talk about RLHF? #226

Answered by rasbt
jingedawang asked this question in Q&A
Discussion options

You must be logged in to vote

Thanks! Regarding DPO, I've actually implemented it for Chapter 7, but then I removed it for 2 reasons

  1. The chapter got way, way too long and exceeded the page limits
  2. I am not very happy with the DPO results

DPO is a nice, and relatively simple technique for preference finetuning, but it didn't quite satisfy the bar regarding fundamental and established techniques that work well. I am currently going to be busy with finishing up the work on the book itself in the next few weeks, but then I plan to polish up the DPO part and perhaps either share it here or on my blog. Then, I plan to do the same with RLHF with dedicated reward models.

In the meantime, you might like my two articles here:

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@jingedawang
Comment options

@rasbt
Comment options

@jingedawang
Comment options

@d-kleine
Comment options

Answer selected by rasbt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
question Further information is requested
3 participants
Converted from issue

This discussion was converted from issue #225 on June 19, 2024 11:08.