Skip to content

Conversation

@izveigor
Copy link
Contributor

@izveigor izveigor commented Jul 7, 2023

Which issue does this PR close?

Closes #6884

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@izveigor izveigor marked this pull request as draft July 7, 2023 16:09
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 7, 2023
@github-actions github-actions bot added the substrait Changes to the substrait crate label Jul 27, 2023

## array containment operator

# array containment operator with scalars #1 (at arrow)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature supports now only scalars as input, but in the production users prefer to work with columns.
I created the ticket dedicated to that issue: #7119.

@izveigor
Copy link
Contributor Author

@alamb @jayzhan211 Can you review this little feature if you have time.

right,
} => {
// Note the order that we push the entries to the stack
// is important. We want to visit the left node first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should we visit the left node first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a copy/paste comment from above.

I believe the rationale is that this code is using an explicit stack (rather than local variables on the call stack) to avoid stack overflows with deeply nested queries.

To be consistent the planner did a depth first search (always following the first / left child first) and thus to replicate that behavior the left child must be put on the stack

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@izveigor
Copy link
Contributor Author

@alamb can you review that change if you have time?

@izveigor izveigor marked this pull request as ready for review July 30, 2023 15:57
@alamb
Copy link
Contributor

alamb commented Jul 31, 2023

On my list

@alamb alamb changed the title feat: array containment operator feat: array containment operator @> and <@ Jul 31, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thanks @izveigor - I am happy to merge this as is of can wait for you to address any comments you would like

It would also be great to add these new operators to the SQL documentation, but given a quick look at the user guide suggests we don't have any docs for supported operators yet. https://arrow.apache.org/datafusion/user-guide/sql/index.html

I will file a follow on ticket to add some

}
Operator::AtArrow
| Operator::ArrowAt => {
arrow_coercion(lhs, rhs).map(Signature::uniform).ok_or_else(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the name arrow_ (and not, array_) here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, with array_ it would be more consistent.

right,
} => {
// Note the order that we push the entries to the stack
// is important. We want to visit the left node first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a copy/paste comment from above.

I believe the rationale is that this code is using an explicit stack (rather than local variables on the call stack) to avoid stack overflows with deeply nested queries.

To be consistent the planner did a depth first search (always following the first / left child first) and thus to replicate that behavior the left child must be put on the stack

@alamb
Copy link
Contributor

alamb commented Jul 31, 2023

Filed #7153 for documentation

@izveigor
Copy link
Contributor Author

izveigor commented Jul 31, 2023

@alamb I fixed all nitpicks which you mentioned in your comments.

Thank you for review, @alamb and @jayzhan211!

@alamb alamb merged commit 371ef46 into apache:main Jul 31, 2023
@alamb
Copy link
Contributor

alamb commented Jul 31, 2023

Thanks again @izveigor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Array containment operator

3 participants