Realtime Inference Writes #558

jgreer013 · 2024-09-26T18:04:23Z

Updated inference engines to write conversations to file once they have them. Writing occurs in a separate thread so inference does not stall.

Towards OPE-320

…ve them. Writing occurs in a separate thread so inference does not stall.

linear · 2024-09-26T18:04:26Z

OPE-320 FR: Update `lema.infer` to support batch inference

Read prompts from files
Write responses to files

src/oumi/core/inference/base_inference_engine.py

taenin · 2024-09-26T18:21:25Z

src/oumi/inference/remote_inference_engine.py

@@ -184,6 +185,11 @@ async def _query_api(
                            response_json, conversation
                        )
                        await asyncio.sleep(remote_params.politeness_policy)
+                        if generation_config.output_filepath:


You can write to the file before sleeping here.

One additional note: this approach will not preserve ordering as each request may complete at different times. Likely OK for now since we're writing the entire conversation (including the input), but something we should keep in mind as a potential fix for later.

@jgreer013 it would be good to open an issue and log a comment with OPE number. I think saving the results out of order will cause a lot of confusion so it would be good to fix it.

@taenin even if we log the conversation, it's not trivial to build a unique id from the conversation content + join with any metadata the user has. It's a decently big burden on the user

src/oumi/core/inference/base_inference_engine.py

nikg4 · 2024-09-27T19:00:15Z

src/oumi/core/inference/base_inference_engine.py

+            conversation: A single conversation to save.
+            output_filepath: The filepath to where the conversation should be saved.
+        """
+        return self._save_conversation(conversation, output_filepath)


(not an expert on this J) Found this function asyncio.to_thread() https://docs.python.org/3/library/asyncio-task.html#running-in-threads
which can be used to make IO-bound functions non-blocking.

It's only available in Python 3.9+ though, while we also support 3.8. There exist workarounds like this: https://github.com/playht/pyht/blob/c8cc319d9d6df818154f1337fa64e0c8385aea9f/pyht/async_client.py#L33

Also, I don't have context on the importance of supporting v3.8 in oumi and whether it can be dropped.

I actually tried using asyncio.create_task, which is hypothetically what we'd want, but I kept running into errors when trying to run it in non-async function calls (mentioning that no event loop was running).

Then when I tried creating my own event loop, I ran into other issues related to the parent caller being on a different loop from the child task (even when I explicitly tell it to be on the same one).

While I believe there's a way to do it, I suspect it'll wind up violating best practices/be an anti-pattern compared to simply making all the engines async.

any luck with wrapping _save_conversation() into asyncio.to_thread(...) from _async method ?asyncio.to_thread(_save_conversation, conversation, output_filepath)

nikg4 · 2024-09-27T19:05:32Z

src/oumi/core/inference/base_inference_engine.py

@@ -68,22 +68,30 @@ def _read_conversations(self, input_filepath: str) -> List[Conversation]:
                    conversations.append(conversation)
        return conversations

-    def _save_conversations(
-        self, conversations: List[Conversation], output_filepath: str
+    def _save_conversation(


How about moving all heavy-lifting into _save_conversation_async() (see https://pypi.org/project/aiofiles/ ) ? and then you could invoke async function from sync one https://stackoverflow.com/questions/55647753/call-async-function-from-sync-function-while-the-synchronous-function-continues

Related example: https://github.com/fgka/py-yaas/blob/c1edc33cbf2258f7d5ba3bd0cde9797480fa09cd/code/core/src/yaas_caching/file.py#L275

This can actually be problematic - asyncio.run should only be used at the top level, and nested calls wind up throwing errors. Embedding this into the class winds up breaking things higher up the hierarchy.

we should probably wrap top-level entry points (e.g., main functions) in asyncio.run https://docs.python.org/3/library/asyncio-task.html#coroutines

@taenin WDYT ? (you're the most familiar with the inference implementation ) This is probably out-of-scope for this PR but would be good to set it up properly

I need to give this some thought over the weekend

oelachqar · 2024-10-01T05:22:32Z

src/oumi/core/inference/base_inference_engine.py

-            for conversation in conversations:
-                json_obj = conversation.model_dump()
-                writer.write(json_obj)
+        with jsonlines.open(output_filepath, mode="a") as writer:


Can multiple threads use this function? How do you prevent them from concurrently writing to this file?

oelachqar · 2024-10-01T05:23:31Z

src/oumi/core/inference/base_inference_engine.py

-            for conversation in conversations:
-                json_obj = conversation.model_dump()
-                writer.write(json_obj)
+        with jsonlines.open(output_filepath, mode="a") as writer:


Opening the file + seek to the end for every conversation is not ideal...

oelachqar · 2024-10-01T05:29:25Z

src/oumi/inference/remote_inference_engine.py

@@ -184,6 +185,11 @@ async def _query_api(
                            response_json, conversation
                        )
                        await asyncio.sleep(remote_params.politeness_policy)
+                        if generation_config.output_filepath:


@jgreer013 it would be good to open an issue and log a comment with OPE number. I think saving the results out of order will cause a lot of confusion so it would be good to fix it.

@taenin even if we log the conversation, it's not trivial to build a unique id from the conversation content + join with any metadata the user has. It's a decently big burden on the user

This PR is based on #558 This PR updates how writes are done for inference. - If all requests are successful, the final written file will have all responses in the same order as the provided input for all engines. - During inference, all requests are written to a `/scratch` directory containing a file with the same name. There is no guarantee of order on the values written to this file (this truly only matters for the InferenceEngines that leverage parallelism). Non-parallel engines will write to disk in-line (blocking). I've benchmarked this for medium sized files (100s of MB): appending a line of text to these files takes on average 1.788e-05 seconds

taenin · 2024-10-03T16:15:52Z

Closing this as the related PR #574 is submitted

Updated inference engines to write conversations to file once they ha…

5038c91

…ve them. Writing occurs in a separate thread so inference does not stall.

jgreer013 changed the title ~~Updated inference engines to write conversations to file once they have them. Writing occurs in a separate thread so inference does not stall.~~ Realtime Async Inference Writes Sep 26, 2024

jgreer013 requested a review from a team September 26, 2024 18:05

taenin reviewed Sep 26, 2024

View reviewed changes

src/oumi/core/inference/base_inference_engine.py Outdated Show resolved Hide resolved

taenin reviewed Sep 26, 2024

View reviewed changes

taenin requested a review from oelachqar September 26, 2024 18:21

taenin reviewed Sep 26, 2024

View reviewed changes

src/oumi/core/inference/base_inference_engine.py Outdated Show resolved Hide resolved

jgreer013 added 2 commits September 26, 2024 13:21

Addressed PR comments and updated test

3cc588e

Merge branch 'main' into jgreer013/async_inference_writes

85dd350

jgreer013 requested a review from taenin September 26, 2024 21:24

taenin requested a review from nikg4 September 26, 2024 22:38

taenin reviewed Sep 26, 2024

View reviewed changes

src/oumi/core/inference/base_inference_engine.py Outdated Show resolved Hide resolved

Removed print

64cac43

taenin reviewed Sep 26, 2024

View reviewed changes

src/oumi/core/inference/base_inference_engine.py Outdated Show resolved Hide resolved

nikg4 reviewed Sep 27, 2024

View reviewed changes

src/oumi/core/inference/base_inference_engine.py Outdated Show resolved Hide resolved

jgreer013 added 2 commits September 27, 2024 11:12

Removed multithreading

341d687

Merge branch 'main' into jgreer013/async_inference_writes

283895a

jgreer013 changed the title ~~Realtime Async Inference Writes~~ Realtime Inference Writes Sep 27, 2024

nikg4 reviewed Sep 27, 2024

View reviewed changes

Merge branch 'main' into jgreer013/async_inference_writes

b05cb72

oelachqar reviewed Oct 1, 2024

View reviewed changes

taenin mentioned this pull request Oct 2, 2024

Inference Engine async writes #574

Merged

taenin closed this Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Realtime Inference Writes #558

Realtime Inference Writes #558

jgreer013 commented Sep 26, 2024 •

edited

Loading

linear bot commented Sep 26, 2024

taenin Sep 26, 2024

jgreer013 Sep 26, 2024

oelachqar Oct 1, 2024

nikg4 Sep 27, 2024

jgreer013 Sep 27, 2024

nikg4 Sep 27, 2024

nikg4 Sep 27, 2024

jgreer013 Sep 27, 2024

nikg4 Sep 27, 2024

taenin Sep 27, 2024

oelachqar Oct 1, 2024

oelachqar Oct 1, 2024

oelachqar Oct 1, 2024

taenin commented Oct 3, 2024

Realtime Inference Writes #558

Realtime Inference Writes #558

Conversation

jgreer013 commented Sep 26, 2024 • edited Loading

linear bot commented Sep 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

taenin commented Oct 3, 2024

jgreer013 commented Sep 26, 2024 •

edited

Loading