This is an issue from fsdp2 implementation in PR 1026.
From log in the PR, param_offload and optimizer_offload has no effect:
configuration:
actor_rollout_ref.actor.fsdp_config.param_offload=True
actor_rollout_ref.actor.fsdp_config.optimizer_offload=True
actor_rollout_ref.actor.fsdp_config.offload_policy was no set
result:
�[36m(WorkerDict pid=2673336)�[0m Before building vllm rollout, memory allocated (GB): 32.006070613861084, memory reserved (GB): 38.259765625
This is because _offload_params method was removed from fsdp2, and param.data.to(torch.device("cpu"), non_blocking=True) has no effect either.
cc @lxg2015