Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPOv2Trainer throws AttributeError: 'NoneType' object has no attribute 'modules' because value_model's default is None #1976

Open
2 of 4 tasks
RylanSchaeffer opened this issue Aug 26, 2024 · 5 comments
Labels
✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO

Comments

@RylanSchaeffer
Copy link
Contributor

RylanSchaeffer commented Aug 26, 2024

System Info

  • transformers version: 4.44.0
  • Platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31
  • Python version: 3.11.9
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.3
  • Accelerate version: 0.32.1
  • Accelerate config: - compute_environment: LOCAL_MACHINE
    - distributed_type: FSDP
    - mixed_precision: bf16
    - use_cpu: False
    - debug: True
    - num_processes: 2
    - machine_rank: 0
    - num_machines: 1
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - enable_cpu_affinity: False
    - fsdp_config: {'fsdp_activation_checkpointing': True, 'fsdp_auto_wrap_policy': 'TRANSFORMER_BASED_WRAP', 'fsdp_backward_prefetch': 'BACKWARD_PRE', 'fsdp_cpu_ram_efficient_loading': True, 'fsdp_forward_prefetch': True, 'fsdp_offload_params': True, 'fsdp_sharding_strategy': 'FULL_SHARD', 'fsdp_state_dict_type': 'SHARDED_STATE_DICT', 'fsdp_sync_module_states': True, 'fsdp_use_orig_params': True}
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []
    - dynamo_config: {'dynamo_backend': 'EAGER'}
  • PyTorch version (GPU?): 2.4.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: NVIDIA A100-SXM4-80GB

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

First, note that in PPOv2Trainer, value_model is Optional with default value of None: https://github.com/huggingface/trl/blob/main/trl/trainer/ppov2_trainer.py#L79

Second, note that PPOv2Trainer attempts to disable dropout for all 4 models: https://github.com/huggingface/trl/blob/main/trl/trainer/ppov2_trainer.py#L144-L145

If no value_model is provided (default), then the error AttributeError: 'NoneType' object has no attribute 'modules' will be thrown because value_model is None.

Expected behavior

I'm not sure what solutions the designers have in mind. Some possible solutions:

  1. Remove None as the default for value_model and make it non-optional
  2. Change the default for value_model
  3. If value_model is None, then clone the reward model.

I'm open to other solutions

@RylanSchaeffer RylanSchaeffer added the 🐛 bug Something isn't working label Aug 26, 2024
@RylanSchaeffer RylanSchaeffer changed the title PPOv2Trainer throws error with value_model: Optional[nn.Module] = None PPOv2Trainer throws AttributeError: 'NoneType' object has no attribute 'modules' because value_model's default is None Aug 26, 2024
@qgallouedec
Copy link
Member

qgallouedec commented Aug 26, 2024

We avoid using downloadable files for security reasons. Please share your solution directly on GitHub as a code snippet or a pull request, so the community can review it.

@RylanSchaeffer
Copy link
Contributor Author

@qgallouedec can I please bump this?

@qgallouedec
Copy link
Member

For the record, there's a common misunderstanding regarding the use of Optional[Type] in type hints. The term "Optional" doesn't mean that the argument is truly optional (i.e., that it can be omitted). It means the argument can either be None or of the specified type (Type).

examples:

def example(x: int = 0): # x is optional, and can't be None
def example(x: Optional[int]): # x is not optional, and can be None

On the other hand, just because a type hint allows None, it doesn't guarantee that None is a valid value at runtime. For example:

def square(x: float):  # type hint allows any float, but the function only works for non-negative floats
    return x ** 0.5

In this specific case with PPOv2Trainer, value_model is technically allowed to be None based on its type hint (Optional), but the code doesn't properly handle None. So while the type hint suggests that None has a valid type, it's not a supported case in practice.

Remove None as the default for value_model and make it non-optional

This is a possible solution, but there's a practical reason for keeping None as the default. Having None allows us to maintain a clean and logical grouping of related arguments. For example, keeping train_dataset (which is required) next to eval_dataset (which is optional) makes the API easier to read and understand.

@qgallouedec
Copy link
Member

If value_model is None, then clone the reward model.

This is probably an interesting improvement.

@qgallouedec qgallouedec reopened this Oct 7, 2024
@qgallouedec qgallouedec added ✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity 🏋 PPO Related to PPO and removed 🐛 bug Something isn't working labels Oct 7, 2024
@qgallouedec qgallouedec added the 🙋 help from community wanted Open invitation for community members to contribute label Oct 20, 2024
@MinecraftEarthVillage

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO
Projects
None yet
Development

No branches or pull requests

3 participants