Trainer is always using IPEX, even when use_ipex=False

### System Info

- `transformers` version: 4.32.0.dev0
- Platform: Linux-5.15.0-75-generic-x86_64-with-glibc2.35
- Python version: 3.10.6
- Huggingface_hub version: 0.16.4
- Safetensors version: 0.3.1
- Accelerate version: 0.21.0
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.0.1+cu117 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No

### Who can help?

@sgugger

### Information

- [X] The official example scripts
- [X] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

Steps to reproduce the behavior:

1. The issue can be reproduced with the [text-classification example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) script (other scripts would have the same issue). I have `intel-extension-for-pytorch==2.0.100` installed in my environment and am running the following command to run_glue.py without `use_ipex` (so it should default to `False`):
   ```
   export MODEL_NAME=distilbert-base-uncased
   export OUTPUT_DIR=/home/dmsuehir/glue_output
   export TASK_NAME=mrpc

   python run_glue.py \
    --model_name_or_path $MODEL_NAME \
    --task_name $TASK_NAME \
    --do_train \
    --max_seq_length 128 \
    --per_device_train_batch_size 64 \
    --learning_rate 2e-5 \
    --num_train_epochs 1 \
    --no_cuda \
    --output_dir $OUTPUT_DIR \
    --bf16
   ```
   The train metrics I see with this run are:
   ```
   ***** train metrics *****
     epoch                    =        1.0
     train_loss               =     0.6083
     train_runtime            = 0:00:37.35
     train_samples            =       3668
     train_samples_per_second =     98.191
     train_steps_per_second   =      1.553
   ``` 
   Note that we are seeing `98.191` samples/second.
2. Next try running the same command, except adding on `--use_ipex`. Note that I am also deleting my output directory between runs.
   ```
   python run_glue.py \
     --model_name_or_path $MODEL_NAME \
     --task_name $TASK_NAME \
     --do_train \
     --max_seq_length 128 \
     --per_device_train_batch_size 64 \
     --learning_rate 2e-5 \
     --num_train_epochs 1 \
     --no_cuda \
     --output_dir $OUTPUT_DIR \
     --bf16 \
     --use_ipex
   ```
   I see a similar training metric for `train_samples_per_second` as step 1:
   ```
   ***** train metrics *****
     epoch                    =        1.0
     train_loss               =     0.6083
     train_runtime            = 0:00:37.94
     train_samples            =       3668
     train_samples_per_second =     96.654
     train_steps_per_second   =      1.528
   ```
3. Finally, I had debugged this issue to look into how IPEX is being used in the Trainer. I found that it can be called in two places: (1) it can get called from the Trainer [here](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L1310) or (2) it can get called by accelerate [here](https://github.com/huggingface/accelerate/blob/main/src/accelerate/accelerator.py#L1748). The Trainer is properly respecting the `use_ipex` arg, however, it appears that accelerate is always using IPEX if it's installed. Digging deeper into this, I found that accelerate would only not use IPEX if [`ACCELERATE_USE_IPEX` gets set to False/0](https://github.com/huggingface/accelerate/blob/main/src/accelerate/state.py#L765). To confirm this, I manually set `ACCELERATE_USE_IPEX=0` and then ran the same script/args from step 1:
   ```
   export ACCELERATE_USE_IPEX=0

   python run_glue.py \
    --model_name_or_path $MODEL_NAME \
    --task_name $TASK_NAME \
    --do_train \
    --max_seq_length 128 \
    --per_device_train_batch_size 64 \
    --learning_rate 2e-5 \
    --num_train_epochs 1 \
    --no_cuda \
    --output_dir $OUTPUT_DIR \
    --bf16
   ```
   And now I see these training metrics, where we see a drop in `train_samples_per_second`, which indicates that IPEX has actually been turned off now that the env var was used:
   ```
   ***** train metrics *****
     epoch                    =        1.0
     train_loss               =      0.697
     train_runtime            = 0:01:07.74
     train_samples            =       3668
     train_samples_per_second =     54.143
     train_steps_per_second   =      0.856
   ```

### Expected behavior

When `use_ipex` is not given or set to `False`, IPEX optimize should not get called.

If it's agreed that this is in fact a bug, I would be happy to work on a PR to fix it. I saw that other accelerate env vars are getting set from `training_args.py`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trainer is always using IPEX, even when use_ipex=False #24871

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Trainer is always using IPEX, even when use_ipex=False #24871

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions