-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
System Info
torch2.1
nvidia docker 23.07
- `Accelerate` version: 0.21.0
- Platform: Linux-5.19.0-32-generic-x86_64-with-glibc2.35
- Python version: 3.10.6
- Numpy version: 1.22.2
- PyTorch version (GPU?): 2.1.0a0+b5021ba (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 125.51 GB
- GPU type: NVIDIA RTX A6000
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: FSDP
- mixed_precision: bf16
- use_cpu: False
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- fsdp_config: {'fsdp_auto_wrap_policy': 'TRANSFORMER_BASED_WRAP', 'fsdp_backward_prefetch_policy': 'BACKWARD_PRE', 'fsdp_forward_prefetch': False, 'fsdp_offload_params': False, 'fsdp_sharding_strategy': 1, 'fsdp_state_dict_type': 'FULL_STATE_DICT', 'fsdp_sync_module_states': False, 'fsdp_transformer_layer_cls_to_wrap': 'BaichuanLayer', 'fsdp_use_orig_params': False}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported `no_trainer` script in the `examples` folder of the `transformers` repo (such as `run_no_trainer_glue.py`)
- [ ] My own task or dataset (give details below)
### Reproduction
fsdp param changed
https://pytorch.org/docs/main/fsdp.html#torch.distributed.fsdp.FullyShardedDataParallel
ignored_parameters is replaced by ignored_states
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1656, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1202, in prepare
result = tuple(
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1203, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1030, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1366, in prepare_model
model = FSDP(model, **kwargs)
TypeError: FullyShardedDataParallel.init() got an unexpected keyword argument 'ignored_parameters'
[2023-08-07 09:58:56,487] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 12764) of binary: /usr/bin/python
### Expected behavior
a smooth operater~