-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Encapsulate PhysicalSortRequrement and add more doc comments
#5863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| options, | ||
| } | ||
| } else { | ||
| PhysicalSortExpr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved into PhysicalSortExpr::into_sort_expr() where I think it is easier to find and understand (turns out converting back / forth from PhysicalSortExpr and PhysicalSortRequirement is common
|
|
||
| fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>> { | ||
| vec![Some(make_sort_requirements_from_exprs(&self.expr))] | ||
| vec![Some(PhysicalSortRequirement::exact_from_exprs(&self.expr))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am imagining that when a user upgrades DataFusion they will and see that the type signature changed from PhysicalSortExpr to PhysicalSortRequirement. Then when they look at PhysicalSortRequirement they can find a function to convert the existing PhysicalSortExpr to PhysicalSortRequirement -- PhysicalSortRequirement::exact_from_exprs
make_sort_requirements_from_exprs was both not documented as well as lost in the list of functions
|
This LGTM. However, let's discuss timing. We have a type alias PR that is waiting for #5661. IMO we should do this after the alias PR as de-conflicting that may be much harder than de-conflicting this. |
I agree -- this one is a nice to have and has no particular urgency. Lets get your PRs merged first |
9f111b5 to
0602ae4
Compare
0602ae4 to
fc77409
Compare
|
The alias PR is #5867. I think apart from a simple name change of |
ozankabak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor comments but LGTM in general
| /// Option to specify how the given column should be sorted. | ||
| /// If unspecified, there is no constraint on sort options. | ||
| pub options: Option<SortOptions>, | ||
| /// If unspecified, there are constraints on sort options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems incorrect, did you mean to start with "If specified, ..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I will fix it (restore to the original)
Update: fixed in a5cc4c2
| /// Creates a new `exact` requirement, which must match the | ||
| /// required options and expression. See | ||
| /// [`PhysicalSortRequirement`] for examples. | ||
| pub fn new_exact(expr: Arc<dyn PhysicalExpr>, options: SortOptions) -> Self { | ||
| Self { | ||
| expr, | ||
| options: Some(options), | ||
| } | ||
| } | ||
|
|
||
| /// Creates a new `expr_only` requirement, which must match the | ||
| /// required expression. See [`PhysicalSortRequirement`] for | ||
| /// examples. | ||
| pub fn new_expr_only(expr: Arc<dyn PhysicalExpr>) -> Self { | ||
| Self { | ||
| expr, | ||
| options: None, | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have a single new and the caller can pass the second argument as None if they don't want to constrain sort options. I get the sense that you have two news because you think it is more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - that is precisely the reason.
For example, I thought it was clearer that the goal was to add a relaxed sort requirement in datafusion/core/src/physical_plan/windows/mod.rs with:
for partition_by in partition_by_exprs {
sort_reqs.push(PhysicalSortRequirement::new_expr_only(partition_by.clone()))
}Rather than
for partition_by in partition_by_exprs {
sort_reqs.push(PhysicalSortRequirement {
expr: partition_by.clone(),
options: None,
});
}Of course they both do the same thing eventually and I agree it is a preference thing about what makes for more readable code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative I had in mind was
for partition_by in partition_by_exprs {
sort_reqs.push(PhysicalSortRequirement::new(partition_by.clone(), None));
// sort_reqs.push(PhysicalSortRequirement::new(partition_by.clone(), Some(sort_options)));
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make that change if you prefer it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in 8a073c4
| pub fn into_sort_expr(self) -> PhysicalSortExpr { | ||
| let Self { expr, options } = self; | ||
|
|
||
| let options = options.unwrap_or(SortOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest unwrap_or_else to explicitly avoid constructing a potentially unused object. The compiler will likely take care of this, but I find making intent visible in code structure is good practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed -- I agree with you. I originally had unwrap_or_else for precisely the reasons you mentioned, but clippy told me to use unwrap_or instead 😭
error: unnecessary closure used to substitute value for `Option::None`
--> datafusion/physical-expr/src/sort_expr.rs:185:23
|
185 | let options = options.unwrap_or_else(|| SortOptions {
| _______________________^
186 | | descending: false,
187 | | nulls_first: false,
188 | | });
| |__________^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations
= note: `-D clippy::unnecessary-lazy-evaluations` implied by `-D warnings`
help: use `unwrap_or(..)` instead
|
185 ~ let options = options.unwrap_or(SortOptions {
186 + descending: false,
187 + nulls_first: false,
188 ~ });
|
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review @ozankabak
Not sure if either @mustafasrepo or @mingmwang are interested in this PR either, given they have been working in this area
| pub fn to_sort_exprs( | ||
| requirements: impl IntoIterator<Item = PhysicalSortRequirement>, | ||
| ) -> Vec<PhysicalSortExpr> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use
pub fn to_sort_exprs<'a>(
requirements: impl IntoIterator<Item = &'a PhysicalSortRequirement>,
) -> Vec<PhysicalSortExpr> as signature for to_sort_exprs as in from_sort_exprs. This would help remove a couple clones in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My rationale for taking PhysicalSortRequirement rather than &PhysicalSortRequirement was actually to avoid a clone in the cases where the caller already had an owned PhysicalSortRequirement. Basically the clone() is moved from inside this function to the callsite
Specifically in datafusion/core/src/physical_optimizer/sort_enforcement.rs
// Make sure we preserve the ordering requirements:
update_child_to_remove_unnecessary_sort(child, sort_onwards, &plan)?;
let sort_expr =
PhysicalSortRequirement::to_sort_exprs(required_ordering);
add_sort_above(child, sort_expr)?;
if is_sort(child) {This signature can avoid a clone.
If the signature looks like
pub fn to_sort_exprs<'a>(
requirements: impl IntoIterator<Item = &'a PhysicalSortRequirement>,
) -> Vec<PhysicalSortExpr> I think it would require always clone()ing the expr and options internally to create an owned PhysicalSortExpr in the output
Of course, we are talking about cloning an Arc and some SortOptions which presumably isn't all that expensive , so maybe it doesn't matter 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling there may be a way to do this with a Borrow trick to implement a single function that can deal with both owned and referenced arguments without duplicating code 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb, it seems like using
pub fn to_sort_exprs<T: Borrow<PhysicalSortRequirement>>(
requirements: impl IntoIterator<Item = T>,
) -> Vec<PhysicalSortExpr> {
requirements
.into_iter()
.map(|e| PhysicalSortExpr::from(e.borrow()))
.collect()
}
pub fn from_sort_exprs<T: Borrow<PhysicalSortExpr>>(
ordering: impl IntoIterator<Item = T>,
) -> Vec<PhysicalSortRequirement> {
ordering
.into_iter()
.map(|e| PhysicalSortRequirement::from(e.borrow()))
.collect()
}lets us get rid of a bunch of clones and results in simpler call site code. AFAIK, this is a zero-cost mechanism. If you concur, we can use this pattern.
I created #11 on your repo so you can easily check it out and incorporate (assuming I'm not overlooking something and this is indeed zero cost obviously).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my point was that I think this call:
PhysicalSortRequirement::from(e.borrow()Invokes this From impl:
impl From<&PhysicalSortRequirement> for PhysicalSortExpr {
fn from(value: &PhysicalSortRequirement) -> Self {
value.clone().into_sort_expr()
}
}
Which has a clone of the value
I have removed the From<&PhysicalSortRequirement> traits in 12cecdb because it is obscuring the locations of the cloning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I thought you wanted to make to_sort_exprs "smart" in the sense that it can take work with an owned value or a reference, using the owned or reference From impl depending on the argument -- making a clone only when necessary.
Anyway I got curious and tried to verify whether the borrow mechanism was zero-cost for this purpose; it turns out it is not. It always calls the reference path. I also found how to do this -- here is an example for future reference.
For posterity, had we wanted to go this route, the smart to_sort_exprs would be:
pub fn to_sort_exprs<T>(
requirements: impl IntoIterator<Item = T>,
) -> Vec<PhysicalSortExpr>
where
PhysicalSortExpr: From<T>,
{
requirements
.into_iter()
.map(|e| PhysicalSortExpr::from(e))
.collect()
}which would route to the right PhysicalSortExpr::from depending on the argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool -- I did not know that. If you think that is a worthwhile change, I would be happy to make a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be a good change, I can already see some use cases for it in some upcoming code (e.g. streaming group bys).
I updated the mini-PR I created for experimentation so you can get the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// If the requirement is *exact* | ||
| /// ([`PhysicalSortRequirement::new_exact`]), then the sort | ||
| /// requirement will only be satisfied if it matches both the | ||
| /// expression *and* the sort options. | ||
| /// | ||
| /// If the requirement is *`expr_only` | ||
| /// ([`PhysicalSortRequirement::new_expr_only`]) then only the expr | ||
| /// must match, to satisfy the requirement. | ||
| /// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we now use PhysicalSortRequirement::new for initialization for both cases. These comments seem remainder for old API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 that is a good spot -- fixed in 1db636a
|
I have made some suggestions. Besides these suggestions, LGTM!. Thanks @alamb. |
PhysicalSortRequrement and add more doc commentPhysicalSortRequrement and add more doc comments
|
Thank you for the reviews and comments @mustafasrepo and @ozankabak |
Which issue does this PR close?
re #5772
Rationale for this change
As I mentioned in #5772 (review) , I found the cognitive load of dealing with
Vec<Option<Vec<PhysicalSortRequirements>>>>challenging and the fact thatPhysicalSortRequirementshad anotherOptioninside it was also challenging when reading the code .Also, it appears that the
make_sort_requirements_from_exprsfunction was not public outside of thedatafusion-physical-exprcrate (so I could not used it while updating IOx -- see https://github.com/influxdata/influxdb_iox/pull/7440). It isn't hard to rewrite but I wanted to help others potentially upgrading by adding more comments and hints of how to create this type of structure.I also think having additional documentation (and method names that are self documenting) adding methods makes the intent clearer when reading.
What changes are included in this PR?
PhysicalSortRequirementprivateFromimplsAre these changes tested?
Covered by existing tests
Are there any user-facing changes?