-
Couldn't load subscription status.
- Fork 3.9k
ARROW-11790: [Rust][DataFusion] Change builder signatures to take impl Interator<Item=Expr> rather than &[Expr] #9703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rust/benchmarks/src/bin/tpch.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pretty good example -- the answer_schema was created and immediately dropped after calling df.select(). Now the exprs don't need to be copied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I initially found the impl IntoIterator<Item .... syntax hard to grok, I figured I would add an example to LogicalPlanBuilder showing that it lets you use vec![] and the trait business can be somewhat ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was even worse here -- two copies of each input expr were made -- one copy to pass to validate_unique_names and once to put on the LogicalPlanAggregate.
Admittedly, removing the second copy doesn't require changing the signature of this function
Yeah, it's a pain but Related: is there a reason that |
@jorgecarleitao might know the answer to that as he introduced the |
1ac88f2 to
200fa6f
Compare
|
@alamb , I do not know the reason. I did not use it on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks a lot, @alamb .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be even sweater once IntoIter is supported for arrays: [col()] vs vec![col()], see rust-lang/rust#65819
I can explain :-) My motivation was extensibility. I wanted users to be able to use the DataFusion DataFrame for local (in-process) execution but then be able to run distributed by having a mechanism to tell the context which DataFrame impl to use (configuration change rather than code change). I have a much better idea now though of the requirements so plan on writing up a design once Ballista is in this repo. |
|
And just to be clear, I am fine with removing the trait. |
|
I am going to rebase this PR and merge it once CI passes |
Thanks @andygrove -- I have no particular need to remove the |
|
I wrote up some very brief notes in https://issues.apache.org/jira/browse/ARROW-12064 |
|
Got. Then fwiw, I think that we can build that with this trait, we just need a bit more work / investigation: I am not 100% sure, but I think that the path for libraries that want to expose a different
For the case of Ballista, I think we need the opposite: Ballista has an (This is a hand-waving idea) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice refactor
…l Interator<Item=Expr> rather than &[Expr]
200fa6f to
e5a0b52
Compare
NOTE:
Since is a fairly major backwards incompatible change (many callsites need to be updated, though mostly mechanically); I gathered some feedback on this approach in #9692 and this is the PR I propose for merge.
I'll leave this open for several days and also send a note to the mailing lists for additional comment
It is part of my overall plan to make the DataFusion optimizer more idiomatic and do much less copying ARROW-11689
Rationale:
All callsites currently need an owned
Vec(or equivalent) so they can pass in&[Expr]and then Datafusion copies all theExprs. Many times the originalVec<Expr>is discarded immediately after use (I'll point out where this happens in a few places below). Thus I it would better (more idiomatic and often less copy/faster) to take something that could produce an iterator over ExprChanges
Dataframeso it takesVec<Expr>rather than&[Expr]LogicalPlanBuilderso it takesimpl Iterator<Item=Expr>rather than&[Expr]I couldn't figure out how to allow the
DataframeAPI (which is a Trait) to take animpl Iterator<Item=Expr>