Server stucks at model warming phase for codestral-22b on 4xH100

### System Info

Tgi version 3.0.1, official docker image: thanks for the amazing last releases 🤗

Within a kubernetes deployment with 256Gi mem request and shm volume.

Prefix caching and chunking enabled.

Works fine on 2xH100 but not on 4, i.e. CUDA_VISIBLE_DEVICES=0,1,2,3

Loading llama3.1-70b works fine on the same config with 4xH100.



### Information

- [X] Docker
- [ ] The CLI directly

### Tasks

- [X] An officially supported command
- [ ] My own modifications

### Reproduction

Start TGI codestral-22b on 4 H100, it stucks at warming model phase.

### Expected behavior

Autoconfig and model warmed up for codestral22b on 4 H100 as it works on 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server stucks at model warming phase for codestral-22b on 4xH100 #2835

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Server stucks at model warming phase for codestral-22b on 4xH100 #2835

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions