param_offload and optimizer_offload has no effect in fsdp2

This is an issue from fsdp2 implementation in [PR 1026](https://github.com/volcengine/verl/pull/1026). 

From [log](https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/qwen2-7b-fsdp2.log) in the PR, param_offload and optimizer_offload has no effect:

configuration:
    actor_rollout_ref.actor.fsdp_config.param_offload=True \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
actor_rollout_ref.actor.fsdp_config.offload_policy was no set

result:
[36m(WorkerDict pid=2673336)[0m Before building vllm rollout, **memory allocated (GB): 32.006070613861084**, memory reserved (GB): 38.259765625

This is because _offload_params method was removed from fsdp2, and `param.data.to(torch.device("cpu"), non_blocking=True)` has no effect either.

cc @lxg2015 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

param_offload and optimizer_offload has no effect in fsdp2 #2822

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

param_offload and optimizer_offload has no effect in fsdp2 #2822

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions