Queue size increases indefinitely #2192

QLutz · 2024-07-05T12:25:02Z

System Info

OS version: Linux
Model being used (curl 127.0.0.1:8080/info | jq): TheBloke/Nous-Hermes-2-Mixtral-8x7B-DPO-AWQ
Hardware used (GPUs, how many, on which cloud) (nvidia-smi): 1xL40S
The current version being used: 2.0.4

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Launch TGI using max_total_tokens=max_batch_prefill_tokens=16384; max_input_length=16383; quantize=awq.
After making a few hundred requests, the pod returns empty packets and only a few seconds after the request a
has been made.
Monitoring reveals that tgi_queue_size increases steadily but does not ever go down.

Expected behavior

No stutters.

The text was updated successfully, but these errors were encountered:

Hugoch · 2024-07-08T15:26:16Z

Hey @QLutz , I suspect it may be related to #2099. Can you try to run TGI with --cuda-graphs 0 see if you still see the hang?

HoKim98 · 2024-07-11T04:44:50Z

I had the same problem, and was able to solve it by trying --cuda-graphs 0 method. This obviously caused major performance problems, but it was at least a better option than being broken.

github-actions · 2024-08-11T01:57:22Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

HoKim98 mentioned this issue Jul 11, 2024

Tgi crash on multi GPUs #2207

Closed

4 tasks

github-actions bot added the Stale label Aug 11, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queue size increases indefinitely #2192

Queue size increases indefinitely #2192

QLutz commented Jul 5, 2024

Hugoch commented Jul 8, 2024

HoKim98 commented Jul 11, 2024 •

edited

Loading

github-actions bot commented Aug 11, 2024

Queue size increases indefinitely #2192

Queue size increases indefinitely #2192

Comments

QLutz commented Jul 5, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

Hugoch commented Jul 8, 2024

HoKim98 commented Jul 11, 2024 • edited Loading

github-actions bot commented Aug 11, 2024

HoKim98 commented Jul 11, 2024 •

edited

Loading