Skip to content

FSDP changed in 2.1 #1818

@WuNein

Description

@WuNein

System Info

torch2.1 
nvidia docker 23.07


- `Accelerate` version: 0.21.0
- Platform: Linux-5.19.0-32-generic-x86_64-with-glibc2.35
- Python version: 3.10.6
- Numpy version: 1.22.2
- PyTorch version (GPU?): 2.1.0a0+b5021ba (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 125.51 GB
- GPU type: NVIDIA RTX A6000
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: FSDP
        - mixed_precision: bf16
        - use_cpu: False
        - num_processes: 2
        - machine_rank: 0
        - num_machines: 1
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - fsdp_config: {'fsdp_auto_wrap_policy': 'TRANSFORMER_BASED_WRAP', 'fsdp_backward_prefetch_policy': 'BACKWARD_PRE', 'fsdp_forward_prefetch': False, 'fsdp_offload_params': False, 'fsdp_sharding_strategy': 1, 'fsdp_state_dict_type': 'FULL_STATE_DICT', 'fsdp_sync_module_states': False, 'fsdp_transformer_layer_cls_to_wrap': 'BaichuanLayer', 'fsdp_use_orig_params': False}
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []


### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported `no_trainer` script in the `examples` folder of the `transformers` repo (such as `run_no_trainer_glue.py`)
- [ ] My own task or dataset (give details below)

### Reproduction

fsdp param changed

https://pytorch.org/docs/main/fsdp.html#torch.distributed.fsdp.FullyShardedDataParallel

ignored_parameters is replaced by ignored_states

File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1656, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1202, in prepare
result = tuple(
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1203, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1030, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1366, in prepare_model
model = FSDP(model, **kwargs)
TypeError: FullyShardedDataParallel.init() got an unexpected keyword argument 'ignored_parameters'
[2023-08-07 09:58:56,487] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 12764) of binary: /usr/bin/python


### Expected behavior

a smooth operater~

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions