integration-test failures on MI300

### System Info

text-generation-inference - V2.4.1
ROCm 6.3/MI300
Python 3.11.10

### Information

- [X] Docker
- [ ] The CLI directly

### Tasks

- [X] An officially supported command
- [ ] My own modifications

### Reproduction

1. Clone repo
```
git clone https://github.com/huggingface/text-generation-inference.git
git checkout tags/v4.2.1
```

2. Build Docker
```
docker build -t tgi:v4.2.1 -f Dockerfile_amd .

export model=HuggingFaceH4/zephyr-7b-beta
export volume=$PWD/data
docker run --device /dev/kfd --device /dev/dri --shm-size 1g -p 8080:80 -v $volume:/data -v $PWD:/ws  --name tgi_test_container tgi:v4.2.1 -e --model-id $model
```

3. Install Dependencies
docker exec -it tgi_test_container  /bin/bash

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Reload the shell
. "$HOME/.cargo/env"

# Install protoc
apt-get install protobuf-compiler -y

# Install packages
apt update 
apt install unzip -y
apt install pkg-config -y
 
# Install modules
```
pip install --no-input pytest
pip install --no-input text_generation
```

4. Run tests
```
cd /ws/text-generation-inference

make integration-tests
```

### Expected behavior

Expected the following test cases to pass
- integration-tests/models/test_compressed_tensors_w8an_fp.py::test_compressed_tensors_w8an_all_params
- integration-tests/models/test_flash_mixtral_gptq.py::test_flash_mixtral_gptq_all_params

I believe there is numerical accuracy problem

Please find the error log below
[IntegrationTests.log](https://github.com/user-attachments/files/18031780/IntegrationTests.log)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

integration-test failures on MI300 #2804

System Info

Information

Tasks

Reproduction

Reload the shell

Install protoc

Install packages

Install modules

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

integration-test failures on MI300 #2804

Description

System Info

Information

Tasks

Reproduction

Reload the shell

Install protoc

Install packages

Install modules

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions