Skip to content

create event loop before predictor setup #1366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 29, 2023

Conversation

technillogue
Copy link
Contributor

@technillogue technillogue commented Nov 7, 2023

The current implementation (prior to #1350) throws an error:

Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/cog/server/worker.py", line 226, in _predict
for r in result:
...
File "/src/src/download.py", line 118, in sync_download_file
return self.loop.run_until_complete(self.download_file(url))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 629, in run_until_complete
self._check_running()
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 590, in _check_running
raise RuntimeError(
RuntimeError: Cannot run the event loop while another loop is running

The problem is that llama's setup() creates a Downloader, which can't find an event loop and creates one, then tries to use that loop in predict, and by that time _ChildWorker.run has used asyncio.run

This solves that specific problem, and since we didn't see any other errors besides llama maybe few other models use an event loop. However, we may want/need to consider moving the check where we inspect if predict() is async earlier, and have more separate code paths so that we don't need to be async if the user isn't using the async feature. On the other hand, doing that could get messy and involve a lot of duplicated code, so if we're sure our users use asyncio.get_running_loop and not asyncio.create_event_loop, it would be best to not do that.

@technillogue technillogue force-pushed the syl/fix-event-loop-for-setup branch 6 times, most recently from 6a5f147 to e6f3e43 Compare November 14, 2023 02:29
@nickstenning
Copy link
Contributor

nickstenning commented Nov 14, 2023

IIUC this is the fix for cog v0.9.0b9 being broken for some models? If so would you have a look at what's broken the tests here so we can get this merged? It's not great for main to be be unreleasable.

@technillogue technillogue force-pushed the syl/fix-event-loop-for-setup branch 4 times, most recently from 27c8835 to 722720f Compare November 14, 2023 18:56
def _setup(self) -> None:
done = Done()
try:
self._predictor = load_predictor_from_ref(self._predictor_ref)
if is_async_predictor(self._predictor):
Copy link
Contributor

@nickstenning nickstenning Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that is_async_predictor checks both setup and predict makes this a little hard to read IMO.

I think it would be clearer if the setup path did:

self._predictor = load_predictor_from_ref(self._predictor_ref)
# Could be a function or a class
if hasattr(self._predictor, "setup"):
    if is_async(self._predictor.setup):
        get_loop().run_until_complete(run_setup_async(self._predictor))
    else:
        run_setup(self._predictor)

and then similarly, later on

def _loop(self) -> None:
    if is_async(get_predict(self._predictor)):
        return get_loop().run_until_complete(self._loop_async())
    return self._loop_sync()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rationale here is that is predict is async, the user has opted into the async world entirely, and we want to create an event loop before running setup, so get_running_loop in the users' setup code will find the same event loop we're going to use to run _loop_async. that may not be the right approach though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, that makes sense. That wasn't clear to me, but the comment you added is very helpful.

Copy link
Contributor Author

@technillogue technillogue Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually now that I think about it just using new_event_loop doesn't mean that get_running_loop will work, I might be completely wrong...

the crux of the issue is def setup() creating a Downloader, which does get_running_loop/new_event_loop, creates a ClientSession implicitly attached to that loop, uses sync_download_files_to_disk, and then (in the future), async def predict() might use await downloader.download_file() to download a finetune, and ClientSession will error because it's not the same event loop as when it was created. my specific downloader code tries to guard against this but a different user might not do that

worse, after setup() finishes doing new_event_loop().run_until_complete() and returns, our subsequent get_event_loop() won't find anything because the loop from setup() isn't running anymore

maybe if predict is async we should just always wrap setup() in a coroutine so that any get_running_loop will do the right thing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're doing here should work.

In [24]: async def who_is_running():
    ...:     print(id(asyncio.get_running_loop()))
    ...:

In [25]: await who_is_running()
4338068384

In [26]: loop = asyncio.new_event_loop()

In [27]: id(loop)
Out[27]: 4342403856

In [28]: loop.run_until_complete(who_is_running())
4342403856

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more like this

def get_loop():
    try:
        return asyncio.get_running_loop()
    except RuntimeError:
        return asyncio.new_event_loop()


async def who_is_running(msg):
    print(msg, id(asyncio.get_running_loop()))


def setup():
    user_loop = get_loop()
    user_loop.run_until_complete(who_is_running("user setup loop"))


async def predict():
    await who_is_running("predict loop")


loop = get_loop()
setup()
loop.run_until_complete(predict())

user setup loop 139844438211888
predict loop 139844449272592

if that's IPython with autoawait you can get confusing results

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have an existing contract on async def setup functions, so we're free to break people who are spawning their own event loops there. They'll discover and fix that during testing.

We do have an established behavior of allowing event loops in non-async def setup or predict functions, disallowing that now will probably break user code.
Maybe that means they should run in another thread so they're free to run their own loop?
(note that we should only call 1 at a time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #1366 (comment) addresses the motivation and the current code in this PR doesn't break people who spawn event loops in setup; running it in a thread shouldn't be needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if neither setup or predict is async, no event loop is created; if predict is async but setup isn't, then an event loop is created but not running, so asyncio.run will work, get_running_loop won't, and people will fall through to new_event_loop. this will be a different event loop than the one that will run an async predict, but that's tolerable

Copy link
Contributor

@nickstenning nickstenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of mostly superficial requests to make things easier to follow. Mostly it looks good.

@technillogue technillogue force-pushed the syl/fix-event-loop-for-setup branch from 6145a85 to 021208a Compare November 14, 2023 20:00
# Could be a function or a class
if hasattr(self._predictor, "setup"):
run_setup(self._predictor)
if inspect.iscoroutinefunction(self._predictor.setup):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should probably use is_async here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setup shouldn't be allowed to be a generator

self._setup()
asyncio.run(self._loop())
self._loop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option that might (?) make things less icky, if I've finally understood the problem here.

def run(self) -> None:
    ...
 
    self._stream_redirector.start()
    asyncio.run(self._run())
    self._stream_redirector.shutdown()

async def _run(self) -> Awaitable[None]:
    await self._setup()
    await self._loop()

async def _setup(self) -> Awaitable[None]:
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know if we're async or not until we load_predictor_from_ref, which might throw an error, that should be handled in the same block as errors from calling setup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could duplicate the setup error handling with a similar pattern as _handle_predict_error and then this could work, I was just a little confused what was happening with Done and _cancelable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but why does the predictor need to be async for these methods to be async?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the predictor isn't async, they haven't opted into the async world, and if we make these methods async we would get the same problems we were having before with there being a running event loop that the user isn't expecting, breaking unsuspecting code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could have a kind of silly check like "async" in open("predict.py").read()

@technillogue
Copy link
Contributor Author

technillogue commented Nov 15, 2023

after reflecting on it, I think the current code has the correct behavior, but not for the reasons I initially thought. here are some some scenarios:

  1. you use the same event loop from sync setup and from sync predict. nothing changes for you.
  2. you have sync setup that uses asyncio.run, and an async predict. your setup and predict will use different event loops, but if setup is only concerned with downloading and predict is only concerned with the inference engine (no downloading), that's fine, and we'd rather not make people rewrite everything here. in this case there is no difference if we new_event_loop before the setup code or not.
  3. you have both async setup and async predict. you can't use asyncio.run/run_until_complete, but you will be in the same running loop
  4. if you really want, you can have async setup and sync predict, which is silly but won't break anything

for comparison, the other approach we paired on is here d68a147

@technillogue technillogue force-pushed the syl/fix-event-loop-for-setup branch 4 times, most recently from 6d43d8d to f7c3236 Compare November 15, 2023 16:46
@technillogue technillogue force-pushed the syl/fix-event-loop-for-setup branch from 1cdf8ab to eab2e31 Compare November 28, 2023 20:39
… in setup

running the worker tests individually is always fine, but sometimes when running the full suite the same loop test throws "Unreliable test timings! On an initial run, this test took 652.54ms, which exceeded the deadline of 500.00ms, but on a subsequent run it took 346.94 ms"

Signed-off-by: technillogue <technillogue@gmail.com>
@technillogue technillogue force-pushed the syl/fix-event-loop-for-setup branch from eab2e31 to 0ca5521 Compare November 28, 2023 20:55
@technillogue technillogue merged commit 08e77bf into main Nov 29, 2023
@technillogue technillogue deleted the syl/fix-event-loop-for-setup branch November 29, 2023 18:40
technillogue added a commit that referenced this pull request Jan 15, 2024
Revert "add test cases for shared loop across setup and predict + asyncio.run in setup"
Revert "don't use async for predict loop if predict is not async"
Revert "conditionally create the event loop if predictor is async, and add a path for hypothetical async setup"
Revert "Revert "Revert "support async predict functions  (#1350)" (#1373)""

This reverts commit 8b8bbac.
This reverts commit 0ca5521.
This reverts commit dcf5ac4.
This reverts commit 47f3fa5.
technillogue added a commit that referenced this pull request Jan 15, 2024
Revert "add test cases for shared loop across setup and predict + asyncio.run in setup"
Revert "don't use async for predict loop if predict is not async"
Revert "conditionally create the event loop if predictor is async, and add a path for hypothetical async setup"
Revert "Revert "Revert "support async predict functions  (#1350)" (#1373)""

This reverts commit 8b8bbac.
This reverts commit 0ca5521.
This reverts commit dcf5ac4.
This reverts commit 47f3fa5.

Signed-off-by: technillogue <technillogue@gmail.com>
technillogue added a commit that referenced this pull request Jan 16, 2024
Revert "add test cases for shared loop across setup and predict + asyncio.run in setup"
Revert "don't use async for predict loop if predict is not async"
Revert "conditionally create the event loop if predictor is async, and add a path for hypothetical async setup"
Revert "Revert "Revert "support async predict functions  (#1350)" (#1373)""

This reverts commit 8b8bbac.
This reverts commit 0ca5521.
This reverts commit dcf5ac4.
This reverts commit 47f3fa5.

Signed-off-by: technillogue <technillogue@gmail.com>
technillogue added a commit that referenced this pull request Jan 16, 2024
Revert "add test cases for shared loop across setup and predict + asyncio.run in setup"
Revert "don't use async for predict loop if predict is not async"
Revert "conditionally create the event loop if predictor is async, and add a path for hypothetical async setup"
Revert "Revert "Revert "support async predict functions  (#1350)" (#1373)""

This reverts commit 8b8bbac.
This reverts commit 0ca5521.
This reverts commit dcf5ac4.
This reverts commit 47f3fa5.

Signed-off-by: technillogue <technillogue@gmail.com>
technillogue added a commit that referenced this pull request Jan 16, 2024
technillogue added a commit that referenced this pull request Jan 22, 2024
technillogue added a commit that referenced this pull request Jan 22, 2024
technillogue added a commit that referenced this pull request Feb 13, 2024
* support async predict functions (#1350, #1373)
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup
* don't use async for predict loop if predict is not async
* add test cases for shared loop across setup and predict + asyncio.run in setup

(reverts commit b533c6b)
technillogue added a commit that referenced this pull request Feb 13, 2024
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup
* don't use async for predict loop if predict is not async
* add test cases for shared loop across setup and predict + asyncio.run in setup

(reverts commit b533c6b)
technillogue added a commit that referenced this pull request Feb 13, 2024
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup
* don't use async for predict loop if predict is not async
* add test cases for shared loop across setup and predict + asyncio.run in setup

(reverts commit b533c6b)
technillogue added a commit that referenced this pull request Feb 13, 2024
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup
* don't use async for predict loop if predict is not async
* add test cases for shared loop across setup and predict + asyncio.run in setup

(reverts commit b533c6b)
technillogue added a commit that referenced this pull request Feb 21, 2024
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup
* don't use async for predict loop if predict is not async
* add test cases for shared loop across setup and predict + asyncio.run in setup

(reverts commit b533c6b)

Signed-off-by: technillogue <technillogue@gmail.com>
technillogue added a commit that referenced this pull request Feb 21, 2024
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup
* don't use async for predict loop if predict is not async
* add test cases for shared loop across setup and predict + asyncio.run in setup

(reverts commit b533c6b)

Signed-off-by: technillogue <technillogue@gmail.com>
technillogue added a commit that referenced this pull request Jun 19, 2024
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup
* don't use async for predict loop if predict is not async
* add test cases for shared loop across setup and predict + asyncio.run in setup

(reverts commit b533c6b)

* lints

Signed-off-by: technillogue <technillogue@gmail.com>
technillogue added a commit that referenced this pull request Jun 19, 2024
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup
* don't use async for predict loop if predict is not async
* add test cases for shared loop across setup and predict + asyncio.run in setup

(reverts commit b533c6b)

* lints

Signed-off-by: technillogue <technillogue@gmail.com>
technillogue added a commit that referenced this pull request Jul 18, 2024
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup
* don't use async for predict loop if predict is not async
* add test cases for shared loop across setup and predict + asyncio.run in setup

(reverts commit b533c6b)

* lints

Signed-off-by: technillogue <technillogue@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants