Make the remote inference engine runnable in jupyter notebooks. #565

taenin · 2024-09-30T19:40:49Z

Asyncio is an invasive dependency. If code using an asyncio loop (such as a Jupyter Notebook) runs code that creates its own asyncio loop, asyncio will throw a RuntimeError.

We get around this issue by creating any asyncio loops in new threads which we block on. Asyncio requires that only one asyncio loop can exist per thread, and this approach preserves that invariant.

Towards OPE-320

linear · 2024-09-30T19:40:51Z

OPE-320 FR: Update `lema.infer` to support batch inference

Read prompts from files
Write responses to files

wizeng23

Any potential downsides or gotchas to this approach you can think of? Is the thread creation overhead time minimal I assume?

taenin · 2024-09-30T19:53:00Z

Any potential downsides or gotchas to this approach you can think of? Is the thread creation overhead time minimal I assume?

There are definitely tradeoffs.

As you mentioned, there is overhead for creating a thread. So long as the running job is somewhat lengthy (as it should be in our case), then this concern is minimal.
The real drawback is in the scenario where an upstream caller is calling us asynchronously. As our async code is running in a thread, ALL of our async jobs will only run when their current thread is active. This means if there is more than one thread that is not blocking, our async jobs will see significant slow down.

src/oumi/core/async_utils.py

tests/core/test_async_utils.py

nikg4 · 2024-09-30T21:55:08Z

src/oumi/core/async_utils.py

+
+
+def safe_asyncio_run(main: Awaitable[T]) -> T:
+    """Run an Awaitable in a new thread. Blocks until the thread is finished.


is it "in a new thread", or "in a new Python process" given that multiprocessing is used?
Looks like it's actually the former (a thread pool) https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.dummy
Thsi page has this recommendation: Users should generally prefer to use concurrent.futures.ThreadPoolExecutor, which has a simpler interface that was designed around threads from the start, and which returns concurrent.futures.Future instances that are compatible with many other libraries, including asyncio.

Would it make sense to switch to ThreadPoolExecutor here ?

Sure, I don't see a problem with using ThreadPoolExecutor here. Updated

To address your original question: my intent here is to run the event loop in a separate thread.

src/oumi/core/async_utils.py

taenin · 2024-10-01T22:29:04Z

See a plethora of examples here on github with similar approaches: https://github.com/search?q=asyncio.set_event_loop+thread&type=code

This is typically done for code that is executing in an environment with an asyncio loop already running (Jupyter Notebook, IPython, Discord bots).

Make the remote inference engine runnable in jupyter notebooks.

31b1dd5

taenin requested review from oelachqar, wizeng23 and nikg4 September 30, 2024 19:41

taenin marked this pull request as ready for review September 30, 2024 19:41

taenin added 2 commits September 30, 2024 12:44

Add typevar

8a7e705

Context manager.

00953d6

wizeng23 approved these changes Sep 30, 2024

View reviewed changes

Fix comments.

93296d8

nikg4 requested a review from jgreer013 September 30, 2024 21:43

nikg4 reviewed Sep 30, 2024

View reviewed changes

src/oumi/core/async_utils.py Outdated Show resolved Hide resolved

nikg4 reviewed Sep 30, 2024

View reviewed changes

tests/core/test_async_utils.py Show resolved Hide resolved

nikg4 reviewed Sep 30, 2024

View reviewed changes

src/oumi/core/async_utils.py Outdated Show resolved Hide resolved

oelachqar approved these changes Oct 1, 2024

View reviewed changes

taenin added 3 commits October 1, 2024 15:50

Add additional tests.

437ff82

Merge branch 'main' into taenin/write

575734f

Switch to futures.

6c232b9

nikg4 approved these changes Oct 1, 2024

View reviewed changes

taenin merged commit f3a0793 into main Oct 1, 2024
1 check passed

taenin deleted the taenin/write branch October 1, 2024 23:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the remote inference engine runnable in jupyter notebooks. #565

Make the remote inference engine runnable in jupyter notebooks. #565

taenin commented Sep 30, 2024

linear bot commented Sep 30, 2024

wizeng23 left a comment

taenin commented Sep 30, 2024

nikg4 Sep 30, 2024 •

edited

Loading

taenin Oct 1, 2024

taenin Oct 1, 2024

taenin commented Oct 1, 2024



		def safe_asyncio_run(main: Awaitable[T]) -> T:
		"""Run an Awaitable in a new thread. Blocks until the thread is finished.

Make the remote inference engine runnable in jupyter notebooks. #565

Make the remote inference engine runnable in jupyter notebooks. #565

Conversation

taenin commented Sep 30, 2024

linear bot commented Sep 30, 2024

wizeng23 left a comment

Choose a reason for hiding this comment

taenin commented Sep 30, 2024

nikg4 Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

taenin Oct 1, 2024

Choose a reason for hiding this comment

taenin Oct 1, 2024

Choose a reason for hiding this comment

taenin commented Oct 1, 2024

nikg4 Sep 30, 2024 •

edited

Loading