Option to disable unwrapping model for generation in PPO/RLOO/OnlineDPO

### Feature request

Add trainer option to disable unwrapping model (`unwrap_model_for_generation()`) in online methods.

### Motivation

As discussed in #2250, when deepspeed stage 3 is used and policy model is bigger than single GPU VRAM, `unwrap_model_for_generation()` will cause OOM. The option to disable unwrapping will make such training scenarios possible (but slow).

### Your contribution

I will likely create PR after it's discussed and approved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to disable unwrapping model for generation in PPO/RLOO/OnlineDPO #2529

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development