Skip to content

integration-test failures on MI300 #2804

@itej89

Description

@itej89

System Info

text-generation-inference - V2.4.1
ROCm 6.3/MI300
Python 3.11.10

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  1. Clone repo
git clone https://github.com/huggingface/text-generation-inference.git
git checkout tags/v4.2.1
  1. Build Docker
docker build -t tgi:v4.2.1 -f Dockerfile_amd .

export model=HuggingFaceH4/zephyr-7b-beta
export volume=$PWD/data
docker run --device /dev/kfd --device /dev/dri --shm-size 1g -p 8080:80 -v $volume:/data -v $PWD:/ws  --name tgi_test_container tgi:v4.2.1 -e --model-id $model
  1. Install Dependencies
    docker exec -it tgi_test_container /bin/bash

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Reload the shell

. "$HOME/.cargo/env"

Install protoc

apt-get install protobuf-compiler -y

Install packages

apt update
apt install unzip -y
apt install pkg-config -y

Install modules

pip install --no-input pytest
pip install --no-input text_generation
  1. Run tests
cd /ws/text-generation-inference

make integration-tests

Expected behavior

Expected the following test cases to pass

  • integration-tests/models/test_compressed_tensors_w8an_fp.py::test_compressed_tensors_w8an_all_params
  • integration-tests/models/test_flash_mixtral_gptq.py::test_flash_mixtral_gptq_all_params

I believe there is numerical accuracy problem

Please find the error log below
IntegrationTests.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions