-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the remote inference engine runnable in jupyter notebooks. #565
Conversation
OPE-320 FR: Update `lema.infer` to support batch inference
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any potential downsides or gotchas to this approach you can think of? Is the thread creation overhead time minimal I assume?
There are definitely tradeoffs.
|
|
||
|
||
def safe_asyncio_run(main: Awaitable[T]) -> T: | ||
"""Run an Awaitable in a new thread. Blocks until the thread is finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it "in a new thread", or "in a new Python process" given that multiprocessing
is used?
Looks like it's actually the former (a thread pool) https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.dummy
Thsi page has this recommendation: Users should generally prefer to use concurrent.futures.ThreadPoolExecutor, which has a simpler interface that was designed around threads from the start, and which returns concurrent.futures.Future instances that are compatible with many other libraries, including asyncio.
Would it make sense to switch to ThreadPoolExecutor
here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I don't see a problem with using ThreadPoolExecutor here. Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address your original question: my intent here is to run the event loop in a separate thread.
See a plethora of examples here on github with similar approaches: https://github.com/search?q=asyncio.set_event_loop+thread&type=code This is typically done for code that is executing in an environment with an asyncio loop already running (Jupyter Notebook, IPython, Discord bots). |
Asyncio is an invasive dependency. If code using an asyncio loop (such as a Jupyter Notebook) runs code that creates its own asyncio loop, asyncio will throw a RuntimeError.
We get around this issue by creating any asyncio loops in new threads which we block on. Asyncio requires that only one asyncio loop can exist per thread, and this approach preserves that invariant.
Towards OPE-320