Skip to content

Inference Engine async writes #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 3, 2024
Merged

Inference Engine async writes #574

merged 5 commits into from
Oct 3, 2024

Conversation

taenin
Copy link
Collaborator

@taenin taenin commented Oct 2, 2024

This PR is based on #558

This PR updates how writes are done for inference.

  • If all requests are successful, the final written file will have all responses in the same order as the provided input for all engines.
  • During inference, all requests are written to a /scratch directory containing a file with the same name. There is no guarantee of order on the values written to this file (this truly only matters for the InferenceEngines that leverage parallelism).

Non-parallel engines will write to disk in-line (blocking). I've benchmarked this for medium sized files (100s of MB): appending a line of text to these files takes on average 1.788e-05 seconds

@taenin taenin marked this pull request as ready for review October 2, 2024 22:50
original_filepath = Path(output_filepath)
return str(original_filepath.parent / "scratch" / original_filepath.name)

async def _save_conversation_async(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a lock on the file writer? If so, at what level is it defined?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this comment I've removed the _save_conversation_async method as I found it's actually slower for writes (mostly benefiting reads).

That said, with my new code we get strong guarantees about no concurrent access from asyncio itself. In asyncio code, a context switch can only happen when you hit an await block. As I'm now doing my write outside of an await, there can only be one concurrent write at a time.

This is distinct from threads, where context switches are far less predictable.

@taenin taenin changed the title Taenin/async writes Inference Engine async writes Oct 2, 2024
# Write what we have so far to our scratch directory.
self._save_conversation(
new_conversation,
self._get_scratch_filepath(generation_config.output_filepath),
Copy link
Collaborator

@nikg4 nikg4 Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is saving conversations into scratch directory, not to the originally configured dir. Wouldn't it be a problem ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair point. I've updated my approach now to only save to a scratch directory for async code. Sync paths will simply append to the target file. Updated!

@taenin taenin merged commit 1524cfd into main Oct 3, 2024
1 check passed
@taenin taenin deleted the taenin/async_writes branch October 3, 2024 16:14
@taenin taenin mentioned this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants