Please use the new API settings to control TF32 behavior, ...

### System Info

> UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:80.)

I'm having an issue with vllm and this warning seams related to transformers? You can find the corresponding error at [vllm#29349](https://github.com/vllm-project/vllm/issues/29349) with relevant info about it.

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```
docker run --runtime nvidia --gpus=all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 vllm/vllm-openai:v0.11.2 --model swiss-ai/Apertus-8B-Instruct-2509
```

### Expected behavior

The new PyTorch API should be used so that MoE models can be loaded and inferred on the Turing architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Please use the new API settings to control TF32 behavior, ... #42371

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Please use the new API settings to control TF32 behavior, ... #42371

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions