-
Notifications
You must be signed in to change notification settings - Fork 617
create event loop before predictor setup #1366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6a5f147
to
e6f3e43
Compare
IIUC this is the fix for cog v0.9.0b9 being broken for some models? If so would you have a look at what's broken the tests here so we can get this merged? It's not great for |
27c8835
to
722720f
Compare
def _setup(self) -> None: | ||
done = Done() | ||
try: | ||
self._predictor = load_predictor_from_ref(self._predictor_ref) | ||
if is_async_predictor(self._predictor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that is_async_predictor
checks both setup and predict makes this a little hard to read IMO.
I think it would be clearer if the setup path did:
self._predictor = load_predictor_from_ref(self._predictor_ref)
# Could be a function or a class
if hasattr(self._predictor, "setup"):
if is_async(self._predictor.setup):
get_loop().run_until_complete(run_setup_async(self._predictor))
else:
run_setup(self._predictor)
and then similarly, later on
def _loop(self) -> None:
if is_async(get_predict(self._predictor)):
return get_loop().run_until_complete(self._loop_async())
return self._loop_sync()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the rationale here is that is predict is async, the user has opted into the async world entirely, and we want to create an event loop before running setup, so get_running_loop in the users' setup code will find the same event loop we're going to use to run _loop_async. that may not be the right approach though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, that makes sense. That wasn't clear to me, but the comment you added is very helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually now that I think about it just using new_event_loop doesn't mean that get_running_loop will work, I might be completely wrong...
the crux of the issue is def setup()
creating a Downloader, which does get_running_loop/new_event_loop, creates a ClientSession implicitly attached to that loop, uses sync_download_files_to_disk, and then (in the future), async def predict()
might use await downloader.download_file()
to download a finetune, and ClientSession will error because it's not the same event loop as when it was created. my specific downloader code tries to guard against this but a different user might not do that
worse, after setup()
finishes doing new_event_loop().run_until_complete()
and returns, our subsequent get_event_loop()
won't find anything because the loop from setup()
isn't running anymore
maybe if predict is async we should just always wrap setup() in a coroutine so that any get_running_loop will do the right thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're doing here should work.
In [24]: async def who_is_running():
...: print(id(asyncio.get_running_loop()))
...:
In [25]: await who_is_running()
4338068384
In [26]: loop = asyncio.new_event_loop()
In [27]: id(loop)
Out[27]: 4342403856
In [28]: loop.run_until_complete(who_is_running())
4342403856
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's more like this
def get_loop():
try:
return asyncio.get_running_loop()
except RuntimeError:
return asyncio.new_event_loop()
async def who_is_running(msg):
print(msg, id(asyncio.get_running_loop()))
def setup():
user_loop = get_loop()
user_loop.run_until_complete(who_is_running("user setup loop"))
async def predict():
await who_is_running("predict loop")
loop = get_loop()
setup()
loop.run_until_complete(predict())
user setup loop 139844438211888
predict loop 139844449272592
if that's IPython with autoawait you can get confusing results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have an existing contract on async def setup
functions, so we're free to break people who are spawning their own event loops there. They'll discover and fix that during testing.
We do have an established behavior of allowing event loops in non-async def setup
or predict functions, disallowing that now will probably break user code.
Maybe that means they should run in another thread so they're free to run their own loop?
(note that we should only call 1 at a time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think #1366 (comment) addresses the motivation and the current code in this PR doesn't break people who spawn event loops in setup; running it in a thread shouldn't be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if neither setup or predict is async, no event loop is created; if predict is async but setup isn't, then an event loop is created but not running, so asyncio.run will work, get_running_loop won't, and people will fall through to new_event_loop. this will be a different event loop than the one that will run an async predict, but that's tolerable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of mostly superficial requests to make things easier to follow. Mostly it looks good.
6145a85
to
021208a
Compare
# Could be a function or a class | ||
if hasattr(self._predictor, "setup"): | ||
run_setup(self._predictor) | ||
if inspect.iscoroutinefunction(self._predictor.setup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should probably use is_async
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setup shouldn't be allowed to be a generator
self._setup() | ||
asyncio.run(self._loop()) | ||
self._loop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option that might (?) make things less icky, if I've finally understood the problem here.
def run(self) -> None:
...
self._stream_redirector.start()
asyncio.run(self._run())
self._stream_redirector.shutdown()
async def _run(self) -> Awaitable[None]:
await self._setup()
await self._loop()
async def _setup(self) -> Awaitable[None]:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't know if we're async or not until we load_predictor_from_ref, which might throw an error, that should be handled in the same block as errors from calling setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could duplicate the setup error handling with a similar pattern as _handle_predict_error and then this could work, I was just a little confused what was happening with Done and _cancelable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but why does the predictor need to be async for these methods to be async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the predictor isn't async, they haven't opted into the async world, and if we make these methods async we would get the same problems we were having before with there being a running event loop that the user isn't expecting, breaking unsuspecting code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could have a kind of silly check like "async" in open("predict.py").read()
after reflecting on it, I think the current code has the correct behavior, but not for the reasons I initially thought. here are some some scenarios:
for comparison, the other approach we paired on is here d68a147 |
6d43d8d
to
f7c3236
Compare
1cdf8ab
to
eab2e31
Compare
… in setup running the worker tests individually is always fine, but sometimes when running the full suite the same loop test throws "Unreliable test timings! On an initial run, this test took 652.54ms, which exceeded the deadline of 500.00ms, but on a subsequent run it took 346.94 ms" Signed-off-by: technillogue <technillogue@gmail.com>
eab2e31
to
0ca5521
Compare
Revert "add test cases for shared loop across setup and predict + asyncio.run in setup" Revert "don't use async for predict loop if predict is not async" Revert "conditionally create the event loop if predictor is async, and add a path for hypothetical async setup" Revert "Revert "Revert "support async predict functions (#1350)" (#1373)"" This reverts commit 8b8bbac. This reverts commit 0ca5521. This reverts commit dcf5ac4. This reverts commit 47f3fa5.
Revert "add test cases for shared loop across setup and predict + asyncio.run in setup" Revert "don't use async for predict loop if predict is not async" Revert "conditionally create the event loop if predictor is async, and add a path for hypothetical async setup" Revert "Revert "Revert "support async predict functions (#1350)" (#1373)"" This reverts commit 8b8bbac. This reverts commit 0ca5521. This reverts commit dcf5ac4. This reverts commit 47f3fa5. Signed-off-by: technillogue <technillogue@gmail.com>
Revert "add test cases for shared loop across setup and predict + asyncio.run in setup" Revert "don't use async for predict loop if predict is not async" Revert "conditionally create the event loop if predictor is async, and add a path for hypothetical async setup" Revert "Revert "Revert "support async predict functions (#1350)" (#1373)"" This reverts commit 8b8bbac. This reverts commit 0ca5521. This reverts commit dcf5ac4. This reverts commit 47f3fa5. Signed-off-by: technillogue <technillogue@gmail.com>
Revert "add test cases for shared loop across setup and predict + asyncio.run in setup" Revert "don't use async for predict loop if predict is not async" Revert "conditionally create the event loop if predictor is async, and add a path for hypothetical async setup" Revert "Revert "Revert "support async predict functions (#1350)" (#1373)"" This reverts commit 8b8bbac. This reverts commit 0ca5521. This reverts commit dcf5ac4. This reverts commit 47f3fa5. Signed-off-by: technillogue <technillogue@gmail.com>
This reverts commit b533c6b.
This reverts commit b533c6b.
This reverts commit b533c6b.
* support async predict functions (#1350, #1373) * conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b)
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b)
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b)
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b)
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b) Signed-off-by: technillogue <technillogue@gmail.com>
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b) Signed-off-by: technillogue <technillogue@gmail.com>
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b) * lints Signed-off-by: technillogue <technillogue@gmail.com>
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b) * lints Signed-off-by: technillogue <technillogue@gmail.com>
* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b) * lints Signed-off-by: technillogue <technillogue@gmail.com>
The current implementation (prior to #1350) throws an error:
The problem is that llama's setup() creates a Downloader, which can't find an event loop and creates one, then tries to use that loop in predict, and by that time
_ChildWorker.run
has usedasyncio.run
This solves that specific problem, and since we didn't see any other errors besides llama maybe few other models use an event loop. However, we may want/need to consider moving the check where we inspect if
predict()
is async earlier, and have more separate code paths so that we don't need to be async if the user isn't using the async feature. On the other hand, doing that could get messy and involve a lot of duplicated code, so if we're sure our users useasyncio.get_running_loop
and notasyncio.create_event_loop
, it would be best to not do that.