Skip to content

Conversation

gjpower
Copy link
Contributor

@gjpower gjpower commented Oct 15, 2024

Supersedes previous MR #1795

Previous implementation creates and locks threads when acquiring llama_proxy, this can cause thread starvation on many parallel requests.
This also prevents call to await run_in_threadpool(llama.create_chat_completion, **kwargs) proceeding as all worker threads are stuck awaiting lock so no progress may be made.

This MR adapts acquiring of llama_proxy to async pattern taking advantage of asyncio mechanisms. ExitStack is replaced with AsyncExitStack and improper closing of the ExitStack is addressed

@gjpower gjpower force-pushed the fix/server_llama_call_thread_starvation branch from de01a63 to 9ec5460 Compare November 5, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants