-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Description
I am working with the OpenAI-serving-engines from the current main branch (python 3.10).
When I try to instantiate an OpenAIServingChat from a coroutine I get the error message AttributeError: 'NoneType' object has no attribute 'chat_template'.
Code Example
Here is some sample code to replicate the problem:
from vllm import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.entrypoints.openai.serving_chat import OpenAIServingChat
import asyncio
async def main():
model = "microsoft/phi-2"
engine_args = AsyncEngineArgs(model=model)
engine = AsyncLLMEngine.from_engine_args(engine_args)
serving_chat = OpenAIServingChat(
engine,
served_model=model,
response_role="assistant",
chat_template=None,
)
if __name__ == "__main__":
asyncio.run(main())If I turn the main-coroutine into a function (just removing the async) and just run it directly (without asyncio) everything works as expected.
Problem Investigation
From what I can tell the problem is as follows:
In the __init__ for OpenAIServing link lines 25ff read:
try:
event_loop = asyncio.get_running_loop()
except RuntimeError:
event_loop = None
if event_loop is not None and event_loop.is_running(
): # If the current is instanced by Ray Serve, there is already a running event loop
event_loop.create_task(self._post_init())
else: # When using single vLLM without engine_use_ray
asyncio.run(self._post_init())Synchronous Case
In the case of a synchronous main function above we enter the else-portion at the bottom in which case asyncio starts a new event loop, runs self._post_init() in it (which loads the tokenizer) and only returns once that has happened. That means the tokenizer is available when OpenAIServingChat calls self._load_chat_template() link in its __init__.
Asynchronous Case
In the case of an asynchronous-main-coroutine above there already is an event loop. Consequently event_loop.create_task(self._post_init()) is called which schedules the tokenizer-loading to be done at some point in the future. However, we do not hit an await before OpenAIServingChat calls self._load_chat_template() so the loop never gets the chance to actually load the tokenizer so it is not there when self._load_chat_template() tries to access it.
Possible solutions
I am not an expert in asyncio-programming so the only solution I found so far is to make _load_chat_template in OpenAIServingChat async as well and replicate the who event-loop/create_task-logic from OpenAIServing's __init__ for the chat-template-loading in the __init__ of OpenAIServingChat. Experimentally that seems to work; however, this doesn't seem like a good solution since I don't think there is any guarantee on the order in which tasks are run by the event-loop so there still could be scenarios in which the error is triggered.
Edit: This does seem to be the only workable solution. To ensure stuff is run in the correct order _load_chat_template will have to wait until the tokenizer is available, e.g.
async def _load_chat_template(self, chat_template):
while self.tokenizer is None:
await asyncio.sleep(.01)
...Additional Observation
Interestingly the error is not triggered when using engine_use_ray=True or workers_use_ray=True in a synchronous-main-function. It appears that at the time of calling the __init__ there is not yet a running event loop so we again hit the working else-case.