`VisionTextDualEncoder`: Distributed training is always enabled

### System Info

- `transformers` version: 4.32.0.dev0
- Platform: Linux-5.15.0-76-generic-x86_64-with-glibc2.31
- Python version: 3.10.10
- Huggingface_hub version: 0.14.1
- Safetensors version: 0.3.1
- Accelerate version: 0.21.0
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.0.0+cu117 (True)
- Tensorflow version (GPU?): 2.13.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
- Jax version: 0.4.13
- JaxLib version: 0.4.13
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: **It seems yes, but I don't want to ;)**


### Who can help?

_No response_

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Hi,

I'm running the **unchanged** ["VisionTextDualEncoder and CLIP model training example"](https://github.com/huggingface/transformers/blob/main/examples/pytorch/contrastive-image-text/run_clip.py)  on my local laptop (which has 1 GPU) and wonder why it claims to do  `distributed training: True` (and not `False`). From the output:

```
07/19/2023 15:21:22 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
```

The above output originates from [`run_clip.py`](https://github.com/huggingface/transformers/blob/ee4250a35f3bd5e9a4379b4907b3d8f9d5d9523f/examples/pytorch/contrastive-image-text/run_clip.py#L260C1-L263C6)

```
    logger.warning(
        f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
        + f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
    )
```

* The default should be `training_args.local_rank=-1`  according to [`TrainingArguments`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments)  but is somehow set to `0` in this example and I don't know why. 
* Adding `local_rank=-1` to the [run_clip.py example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text#train-the-model) does not show any effect.

My questions:

* Is it intended that `local_rank` is set to `0`? 
* Does  `local_rank=0` really mean that distributed training in `Trainer` is enabled? (I'm new to `Trainer` and usually work with `DistributedDataParallel`)
* How to switch off distributed training?

--- 

Bigger picture: Sometimes my training (on a cluster) hangs up in n-1 iteration and never finishes. I wonder if this has to do with distributed training. I don't know how to debug this.

```
100%|█████████▉| 2875/2876 [11:34<00:00,  4.10it/s]
````

Thanks in advance! 

### Expected behavior

I don't want to use distributed training, i.e. `training_args.local_rank = -1`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`VisionTextDualEncoder`: Distributed training is always enabled #24924

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VisionTextDualEncoder: Distributed training is always enabled #24924

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`VisionTextDualEncoder`: Distributed training is always enabled #24924