-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[examples
] Big refactor of examples and documentation
#509
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks really great! Left a few small comments and two main points:
Doc structure
What do you think about simplifying/renaming the sections a bit in the docs:
-
API
- Model Classes
- Trainer Classes
- Reward Model Training
- Supervised Fine-Tuning
- Best-of-N Samppling
-
Examples
- Sentiment Tuning
- Training with PEFT
- Detoxifying LLMs
- StackLlama
- Multi-Adapter Training
I think we could also do a better job at giving an overview of the docs in the landing page on the docs.
Examples structure
For consistency I would call the reward examples folder reward_trainer
. Having said that I wonder if we shouldn't further simplify the examples only to have these folders:
- research_projects
- notebooks
- scripts
Most of the folders only have 1-2 scripts anyway.
@@ -28,7 +28,7 @@ | |||
|
|||
|
|||
if is_peft_available(): | |||
from peft import PeftModel, get_peft_model | |||
from peft import PeftModel, get_peft_model, prepare_model_for_int8_training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about int4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works for both, currently in PEFT there is a method called prepare_model_for_kbit_training
, but we did not made a release yet (they're the same method, just changed the name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @lvwerra for the extensive review, should be all good now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, I think we can merge soon! A few small nits:
Short Sections Names
It would be really nice if we could make all the section heads single line in the sidebar. Suggestions:
- Training your own reward model -> Reward Model Training
- Best of N sampling - Better model output without reinforcement learning -> Best of N Sampling
- Multi Adapter RL (MARL) - a single base model for everything -> Multi Adapter RLHF or Multi Adapter RL?
- Using Llama with TRL -> Training StackLlama
Section Capitalisation
Also we should be consistent in capitalisation. I am in favour of capitalising the sections heads. E.g. Reward Model Training rather than Reward model training. What do you think?
SFTTrainer
It seems to be missing from the Trainer Classes
docs page.
API documentation: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding 1 sentence here per bullet point for people who don't know e.g. what an SFTTrainer
could do or what a StackLlama is :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
Looks much cleaner now ! Addressed your comment @lvwerra ! |
This PR is an attempt to refactor most of the examples and adapt the documentation accordingly
Added also a small enhancement for the reward trainer to directly call
prepare_model_for_int8_training
method under the hood.TODOs:
Examples:
examples/summarization
folderRemoving all scripts related to merging adapter weights since in PEFT you can simply do:
Documentation
To discuss
cc @lvwerra @vwxyzjn