Open
Description
System Info
- CPU EPYC 7H12 (32 core)
- GPU NVIDIA A100-SXM4-80GB
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- pull triton official image
- clone tensorrtllm_backend
- move your model to repo and make config files
- start the container with docker-compose.yaml file like bellow
services:
tritonserver:
image: triton_trt_llm
network_mode: "host"
container_name: triton
shm_size: '1gb'
volumes:
- /data:/workspace
working_dir: /workspace
restart: always
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command: bash -c "python3 ./tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=tensorrtllm_backend/all_models/inflight_batcher_llm/"
Expected behavior
After running the docker compose up
command, I expect the container to start the Triton server, wait for it, and remain running unless an error occurs.
actual behavior
The container starts, runs the Python script, and exits immediately without waiting for the Triton server and TensorRT-LLM backend.
additional notes
This bug will be fixed with a simple command after the last line in scripts/launch_triton_server.py like this
child = subprocess.Popen(cmd, env=env)
child.communicate()