[Bug] accelerate ignores `TPU` #3169

steveepreston · 2024-10-14T17:58:08Z

System Info

latest version. tested via both `pip install -U accelerate` and `pip install git+https://github.com/huggingface/accelerate`

Information

My own modified scripts
The official example scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

While trying to fine-tune llms via torch/transformers on Kaggle TPU v3-8, getting an error that says accelerate don't count TPUs as device:

Error: RuntimeError: There are currently no available devices found, must be one of 'XPU', 'CUDA', or 'NPU'.

To make sure i also tested GoogleCloudPlatform Example that is a torch TPU fine-tune, got same exact error.

Error thrown on the trainer = SFTTrainer(...). You can see full Traceback of Error in below:

Click to Show Full Error Traceback

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[48], line 4
      1 from trl import SFTTrainer
      2 from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
----> 4 trainer = SFTTrainer(
      5     model=base_model,
      6     train_dataset=data,
      7     args=TrainingArguments(
      8         per_device_train_batch_size=BATCH_SIZE,  # This is actually the global batch size for SPMD.
      9         num_train_epochs=1,
     10         max_steps=-1,
     11         output_dir="/output_dir",
     12         optim="adafactor",
     13         logging_steps=1,
     14         dataloader_drop_last = True,  # Required for SPMD.
     15         fsdp="full_shard",
     16         fsdp_config=fsdp_config,
     17     ),
     18     peft_config=lora_config,
     19     dataset_text_field="quote",
     20     max_seq_length=max_seq_length,
     21     packing=True,
     22 )

File /usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:101, in _deprecate_arguments.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
     99         message += "\n\n" + custom_message
    100     warnings.warn(message, FutureWarning)
--> 101 return f(*args, **kwargs)

File /usr/local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:401, in SFTTrainer.__init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics, peft_config, dataset_text_field, packing, formatting_func, max_seq_length, infinite, num_of_sequences, chars_per_token, dataset_num_proc, dataset_batch_size, neftune_noise_alpha, model_init_kwargs, dataset_kwargs, eval_packing)
    395 if tokenizer.padding_side is not None and tokenizer.padding_side != "right":
    396     warnings.warn(
    397         "You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to "
    398         "overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code."
    399     )
--> 401 super().__init__(
    402     model=model,
    403     args=args,
    404     data_collator=data_collator,
    405     train_dataset=train_dataset,
    406     eval_dataset=eval_dataset,
    407     tokenizer=tokenizer,
    408     model_init=model_init,
    409     compute_metrics=compute_metrics,
    410     callbacks=callbacks,
    411     optimizers=optimizers,
    412     preprocess_logits_for_metrics=preprocess_logits_for_metrics,
    413 )
    415 # Add tags for models that have been loaded with the correct transformers version
    416 if hasattr(self.model, "add_model_tags"):

File /usr/local/lib/python3.10/site-packages/transformers/trainer.py:411, in Trainer.__init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics)
    408 self.deepspeed = None
    409 self.is_in_train = False
--> 411 self.create_accelerator_and_postprocess()
    413 # memory metrics - must set up as early as possible
    414 self._memory_tracker = TrainerMemoryTracker(self.args.skip_memory_metrics)

File /usr/local/lib/python3.10/site-packages/transformers/trainer.py:4858, in Trainer.create_accelerator_and_postprocess(self)
   4855     args.update(accelerator_config)
   4857 # create accelerator object
-> 4858 self.accelerator = Accelerator(**args)
   4859 # some Trainer classes need to use `gather` instead of `gather_for_metrics`, thus we store a flag
   4860 self.gather_function = self.accelerator.gather_for_metrics

File /usr/local/lib/python3.10/site-packages/accelerate/accelerator.py:349, in Accelerator.__init__(self, device_placement, split_batches, mixed_precision, gradient_accumulation_steps, cpu, dataloader_config, deepspeed_plugin, fsdp_plugin, megatron_lm_plugin, rng_types, log_with, project_dir, project_config, gradient_accumulation_plugin, step_scheduler_with_optimizer, kwargs_handlers, dynamo_backend, deepspeed_plugins)
    345         raise ValueError(f"FSDP requires PyTorch >= {FSDP_PYTORCH_VERSION}")
    347 if fsdp_plugin is None:  # init from env variables
    348     fsdp_plugin = (
--> 349         FullyShardedDataParallelPlugin() if os.environ.get("ACCELERATE_USE_FSDP", "false") == "true" else None
    350     )
    351 else:
    352     if not isinstance(fsdp_plugin, FullyShardedDataParallelPlugin):

File <string>:21, in __init__(self, sharding_strategy, backward_prefetch, mixed_precision_policy, auto_wrap_policy, cpu_offload, ignored_modules, state_dict_type, state_dict_config, optim_state_dict_config, limit_all_gathers, use_orig_params, param_init_fn, sync_module_states, forward_prefetch, activation_checkpointing, cpu_ram_efficient_loading, transformer_cls_names_to_wrap, min_num_params)

File /usr/local/lib/python3.10/site-packages/accelerate/utils/dataclasses.py:1684, in FullyShardedDataParallelPlugin.__post_init__(self)
   1682     device = torch.xpu.current_device()
   1683 else:
-> 1684     raise RuntimeError(
   1685         "There are currently no available devices found, must be one of 'XPU', 'CUDA', or 'NPU'."
   1686     )
   1687 # Create a function that will be used to initialize the parameters of the model
   1688 # when using `sync_module_states`
   1689 self.param_init_fn = lambda x: x.to_empty(device=device, recurse=False)

RuntimeError: There are currently no available devices found, must be one of 'XPU', 'CUDA', or 'NPU'.

upgraded transformers, peft and trl to latest version, but got same error.

Expected behavior

accelerate detect TPU and dont throw error through accelerate/utils/dataclasses.py.

The text was updated successfully, but these errors were encountered:

steveepreston · 2024-10-15T09:30:38Z

update:

the error RuntimeError: There are currently no available devices found, must be one of 'XPU', 'CUDA', or 'NPU' dosen't thrown on transformers==4.38.2 and llama-3 fine-tune successfully done on TPU VM

but llama-3.1 requires upgraded version of transformers. so it's a deadend.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] accelerate ignores `TPU` #3169

[Bug] accelerate ignores `TPU` #3169

steveepreston commented Oct 14, 2024 •

edited

Loading

steveepreston commented Oct 15, 2024

[Bug] accelerate ignores TPU #3169

[Bug] accelerate ignores TPU #3169

Comments

steveepreston commented Oct 14, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

steveepreston commented Oct 15, 2024

[Bug] accelerate ignores `TPU` #3169

[Bug] accelerate ignores `TPU` #3169

steveepreston commented Oct 14, 2024 •

edited

Loading