fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context #1798

gjpower · 2024-10-15T15:04:09Z

Supersedes previous MR #1795

Previous implementation creates and locks threads when acquiring llama_proxy, this can cause thread starvation on many parallel requests.
This also prevents call to await run_in_threadpool(llama.create_chat_completion, **kwargs) proceeding as all worker threads are stuck awaiting lock so no progress may be made.

This MR adapts acquiring of llama_proxy to async pattern taking advantage of asyncio mechanisms. ExitStack is replaced with AsyncExitStack and improper closing of the ExitStack is addressed

…led by finally from on_complete anyway

gjpower mentioned this pull request Oct 15, 2024

Fix: add missing exit_stack.close() to end of /v1/completions endpoint #1795

Closed

gjpower mentioned this pull request Oct 23, 2024

Change server approach to handle parallel requests #1550

Open

gjpower force-pushed the fix/server_llama_call_thread_starvation branch from 8745712 to de01a63 Compare October 31, 2024 09:34

gjpower added 4 commits November 5, 2024 10:57

fix: make use of asyncio to lock llama_proxy context

222ed7c

fix: use aclose instead of close for AsyncExitStack

ab0b783

fix: don't call exit stack close in stream iterator as it will be cal…

4da21ce

…led by finally from on_complete anyway

fix: use anyio.Lock instead of asyncio.Lock

9ec5460

gjpower force-pushed the fix/server_llama_call_thread_starvation branch from de01a63 to 9ec5460 Compare November 5, 2024 10:57

gjpower and others added 3 commits November 22, 2024 10:59

Merge branch 'main' into fix/server_llama_call_thread_starvation

7bf48e3

Merge branch 'main' into fix/server_llama_call_thread_starvation

ad35fc1

Merge branch 'main' into fix/server_llama_call_thread_starvation

9e0728b

abetlen merged commit 9bd0c95 into abetlen:main Dec 6, 2024

gjpower mentioned this pull request Dec 9, 2024

fix: add missing await statements for async exit_stack handling #1858

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context #1798

fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context #1798

Uh oh!

gjpower commented Oct 15, 2024 •

edited

Loading

Uh oh!

Uh oh!

fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context #1798

fix: Avoid thread starvation on many concurrent requests by making use of asyncio to lock llama_proxy context #1798

Uh oh!

Conversation

gjpower commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gjpower commented Oct 15, 2024 •

edited

Loading