-
Notifications
You must be signed in to change notification settings - Fork 990
fix: allow AsyncFileWriter
to be not Send
#7613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: allow AsyncFileWriter
to be not Send
#7613
Conversation
Let me know how y'all feel about this. I think ideally actually we would get rid of trait AsyncFileReader: AsyncRead + AsyncSeek {
fn poll_metadata(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
options: Option<&'a ArrowReaderOptions>
) -> Poll<Result<Arc<ParquetMetaData>>> {
// default implementation
}
} and then implement the appropriate futures. That way we don't need to worry about the specifics of Send and things |
Hi, does |
yes! same with |
The motivation behind this trait is to enable zero-copy data writing. Almost all Arrow users generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written this will make the functions within the writer !Send
which is not something we can really do. One option might be to switch the trait to using GATs instead of boxed futures, as this will allow the machinery to work at least in theory, although in practice the compiler can get confused...
Another option might be to add a regular associated type, and the implementer can choose to use BoxFuture or LocalBoxFuture
b636a82
to
e306c7c
Compare
Ooh gotcha, makes a lot of sense |
I've gone with an associated type because GATs do not allow for dyn |
Yeah, the lifetimes kind of turn into a bit of a mess... I'd need to sit down and check but I'm fairly confident constraining the lifetime on the trait will cause very hard to debug issues down the line. Ultimately the AsyncArrowWriter isn't really doing all that much, its basically just a shim around the sync writer that checks its buffer after every batch and flushes it if it has gotten large. I wonder if the simplest option is just for you to do this yourself... |
Yeah, I agree. It's a little ugly. I think alternatively we could implement the trait with a poll function and implement a future on top of it. It's harder to implement but its easiest for the compiler to pick up. I'll implement it and you can tell me what you think |
I'm going to assume this probably won't be a desirable solution for you @tustvold, but I'll link it for completeness main...jpopesculian:arrow-rs:fix-async-arrow-writer-no-send-2 |
Maybe the solution is to put a feature flag on it @tustvold? Something like |
497d3ca
to
c1ea3ff
Compare
I updated with a feature flag. I think its the cleanest solution (: |
b3b49cc
to
fcc91db
Compare
Any suggestions on this one @tustvold ? |
fcc91db
to
bbf2a8f
Compare
I believe we will support a similar usecase for the reader here:
Basically we will provide a parquet decoder that is not tied to the particular IO I think you can do something similar already for writing using https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowColumnWriter.html thought that may require buffering in a writer / using channels Maybe we should look into making the split between IO and CPU clearer in the writer too |
Which issue does this PR close?
Closes #7612.
What changes are included in this PR?
Changes
BoxFuture
toLocalBoxFuture
. I tried a few implementations like usingasync
in the trait or using associated types for the futures, but I find this one to be the easiest to manage, although it unfortunately does not allow for the future to be send if the user so desires. I think generally, using poll and context would make this more flexible, but I didn't want to overcomplicate this PR as generally I think people implement this trait just for use withAsyncArrowWriter
Are there any user-facing changes?
This is a breaking change as it changes the trait implementation and would be fixed wherever a user implements that trait. It's an easy fix however because
LocalBoxFuture
is always compatible toBoxFuture
so the user would just have to callboxed_local
instead ofboxed
.