OpenAIServingChat cannot be instantiated within a running event loop

I am working with the OpenAI-serving-engines from the current main branch (python 3.10).

When I try to instantiate an `OpenAIServingChat` from a coroutine I get the error message `AttributeError: 'NoneType' object has no attribute 'chat_template'`. 

## Code Example
Here is some sample code to replicate the problem:
```python
from vllm import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.entrypoints.openai.serving_chat import OpenAIServingChat

import asyncio

async def main():
    model = "microsoft/phi-2"
    engine_args = AsyncEngineArgs(model=model)
    engine = AsyncLLMEngine.from_engine_args(engine_args)
    serving_chat = OpenAIServingChat(
        engine,
        served_model=model,
        response_role="assistant",
        chat_template=None,
    )
 

if __name__ == "__main__":
    asyncio.run(main())
```
If I turn the main-coroutine into a function (just removing the `async`) and just run it directly (without `asyncio`) everything works as expected.

## Problem Investigation
From what I can tell the problem is as follows:

In the `__init__` for `OpenAIServing` [link](https://github.com/vllm-project/vllm/blob/ab406446691f289ef51d1abd8d1ff66760eda36f/vllm/entrypoints/openai/serving_engine.py#L25) lines 25ff read:
```python
try:
    event_loop = asyncio.get_running_loop()
except RuntimeError:
    event_loop = None

if event_loop is not None and event_loop.is_running(
):  # If the current is instanced by Ray Serve, there is already a running event loop
    event_loop.create_task(self._post_init())
else:  # When using single vLLM without engine_use_ray
    asyncio.run(self._post_init())
```
### Synchronous Case
In the case of a synchronous main function above we enter the `else`-portion at the bottom in which case `asyncio` starts a new event loop, runs `self._post_init()` in it (**which loads the tokenizer**) and only returns once that has happened. That means the tokenizer is available when `OpenAIServingChat` calls `self._load_chat_template()` [link](https://github.com/vllm-project/vllm/blob/ab406446691f289ef51d1abd8d1ff66760eda36f/vllm/entrypoints/openai/serving_chat.py#L28) in its `__init__`.

### Asynchronous Case
In the case of an asynchronous-main-coroutine above there already is an event loop. Consequently `event_loop.create_task(self._post_init())` is called which schedules the tokenizer-loading to be done at some point in the future. However, we do not hit an `await` before `OpenAIServingChat` calls `self._load_chat_template()` so the loop never gets the chance to actually load the tokenizer so it is not there when `self._load_chat_template()` tries to access it.

## Possible solutions
I am not an expert in asyncio-programming so the only solution I found so far is to make `_load_chat_template` in `OpenAIServingChat` async as well and replicate the who event-loop/create_task-logic from `OpenAIServing`'s `__init__` for the chat-template-loading in the `__init__` of `OpenAIServingChat`. Experimentally that seems to work; however, this doesn't seem like a good solution since I don't think there is any guarantee on the order in which tasks are run by the event-loop so there still could be scenarios in which the error is triggered.

Edit: This does seem to be the only workable solution. To ensure stuff is run in the correct order `_load_chat_template` will have to wait until the tokenizer is available, e.g. 
```python
async def _load_chat_template(self, chat_template):
  while self.tokenizer is None:
    await asyncio.sleep(.01)
  ...
```

## Additional Observation
Interestingly the error is not triggered when using `engine_use_ray=True` or `workers_use_ray=True` in a synchronous-main-function. It appears that at the time of calling the `__init__` there is not yet a running event loop so we again hit the working `else`-case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

OpenAIServingChat cannot be instantiated within a running event loop #2683

Code Example

Problem Investigation

Synchronous Case

Asynchronous Case

Possible solutions

Additional Observation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

OpenAIServingChat cannot be instantiated within a running event loop #2683

Description

Code Example

Problem Investigation

Synchronous Case

Asynchronous Case

Possible solutions

Additional Observation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions