Skip to content

OpenAIServingChat cannot be instantiated within a running event loop #2683

Closed
@schoennenbeck

Description

I am working with the OpenAI-serving-engines from the current main branch (python 3.10).

When I try to instantiate an OpenAIServingChat from a coroutine I get the error message AttributeError: 'NoneType' object has no attribute 'chat_template'.

Code Example

Here is some sample code to replicate the problem:

from vllm import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.entrypoints.openai.serving_chat import OpenAIServingChat

import asyncio

async def main():
    model = "microsoft/phi-2"
    engine_args = AsyncEngineArgs(model=model)
    engine = AsyncLLMEngine.from_engine_args(engine_args)
    serving_chat = OpenAIServingChat(
        engine,
        served_model=model,
        response_role="assistant",
        chat_template=None,
    )
 

if __name__ == "__main__":
    asyncio.run(main())

If I turn the main-coroutine into a function (just removing the async) and just run it directly (without asyncio) everything works as expected.

Problem Investigation

From what I can tell the problem is as follows:

In the __init__ for OpenAIServing link lines 25ff read:

try:
    event_loop = asyncio.get_running_loop()
except RuntimeError:
    event_loop = None

if event_loop is not None and event_loop.is_running(
):  # If the current is instanced by Ray Serve, there is already a running event loop
    event_loop.create_task(self._post_init())
else:  # When using single vLLM without engine_use_ray
    asyncio.run(self._post_init())

Synchronous Case

In the case of a synchronous main function above we enter the else-portion at the bottom in which case asyncio starts a new event loop, runs self._post_init() in it (which loads the tokenizer) and only returns once that has happened. That means the tokenizer is available when OpenAIServingChat calls self._load_chat_template() link in its __init__.

Asynchronous Case

In the case of an asynchronous-main-coroutine above there already is an event loop. Consequently event_loop.create_task(self._post_init()) is called which schedules the tokenizer-loading to be done at some point in the future. However, we do not hit an await before OpenAIServingChat calls self._load_chat_template() so the loop never gets the chance to actually load the tokenizer so it is not there when self._load_chat_template() tries to access it.

Possible solutions

I am not an expert in asyncio-programming so the only solution I found so far is to make _load_chat_template in OpenAIServingChat async as well and replicate the who event-loop/create_task-logic from OpenAIServing's __init__ for the chat-template-loading in the __init__ of OpenAIServingChat. Experimentally that seems to work; however, this doesn't seem like a good solution since I don't think there is any guarantee on the order in which tasks are run by the event-loop so there still could be scenarios in which the error is triggered.

Edit: This does seem to be the only workable solution. To ensure stuff is run in the correct order _load_chat_template will have to wait until the tokenizer is available, e.g.

async def _load_chat_template(self, chat_template):
  while self.tokenizer is None:
    await asyncio.sleep(.01)
  ...

Additional Observation

Interestingly the error is not triggered when using engine_use_ray=True or workers_use_ray=True in a synchronous-main-function. It appears that at the time of calling the __init__ there is not yet a running event loop so we again hit the working else-case.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions