-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[Substrait to Velox] Support selecting a subfield from struct #2451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for meta-velox canceled.
|
25e1651 to
3ab653c
Compare
3ab653c to
0931c31
Compare
|
Hi, @mbasmanova Could you please take a look at this PR? |
0931c31 to
7db0780
Compare
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rui-mo @zhejiangxiaomai Some questions.
velox/substrait/tests/VeloxSubstraitRoundTripPlanConverterTest.cpp
Outdated
Show resolved
Hide resolved
velox/substrait/tests/VeloxSubstraitRoundTripPlanConverterTest.cpp
Outdated
Show resolved
Hide resolved
7db0780 to
1bc8458
Compare
|
Hi, @mbasmanova sorry for late reply. Could you please review again when you are free? |
1bc8458 to
b0de7e3
Compare
|
@zhejiangxiaomai It has been a long time. CircleCI is failing. Would you take a look? |
Yes, I am fixing it. The reason why we didn't reply was that we had a seven day holiday called National Day. |
b0de7e3 to
1e10e31
Compare
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhejiangxiaomai Some comments.
velox/substrait/tests/VeloxSubstraitRoundTripPlanConverterTest.cpp
Outdated
Show resolved
Hide resolved
velox/substrait/tests/VeloxSubstraitRoundTripPlanConverterTest.cpp
Outdated
Show resolved
Hide resolved
velox/substrait/tests/VeloxSubstraitRoundTripPlanConverterTest.cpp
Outdated
Show resolved
Hide resolved
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are open comments.
79d5418 to
155b45a
Compare
|
Hi @zhejiangxiaomai , Feel free to pull from latest and address open comments if still interested . Thanks ! |
Thank you for reminding me. I am fixing some comments in this PR. I will pull latest commit for this PR. |
363b974 to
e2e5481
Compare
|
Hi @mbasmanova, I passed the UT for nested struct. Would you review this PR again? It's been a long time. |
cb2ba77 to
1ee7455
Compare
|
@zhejiangxiaomai It has been a very long time indeed. I have things I need to work on today and tomorrow. I'll try to get to this PR sometime later this week. |
Just warm reminder. @mbasmanova |
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions.
| } | ||
|
|
||
| inputColumnType = asRowType(inputColumnType->childAt(idx)); | ||
| tmp = &tmp->child().struct_field(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is tmp->child() guaranteed to be a direct reference? Where can I read about direct_reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- struct_field() will return class
StructField. StructField link - StructField child() will return
ReferenceSegmentstruct_field.child() link - ReferenceSegment will represent one type of
MapKeyStructFieldListElement. ReferenceSegment define - direct_reference is also ReferenceSegment. direct_reference link
| // for a leaf in an expression, find idx from child by exprName. | ||
| // for a dereference expression, find idx from every child by exprName. | ||
| if (fieldExpr->isInputColumn() || | ||
| std::dynamic_pointer_cast<const core::FieldAccessTypedExpr>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need special processing for the case when input of FieldAccessTypedExpr is not FieldAccessTypedExpr? I expect we should process all cases where input is not null the same:
1- Convert input to substrait expression.
2- Wrap the converted expression in substrait dereference expression.
Do we have tests for both code paths ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
---- Do we have tests for both code paths ?
Yes, we have tests for both code paths.
- case
(cast(row_constructor(a, b) as row(a bigint, b bigint))).ais the case when input of FieldAccessTypedExpr is not FieldAccessTypedExpr. - case
"(abc).ab.a"is the normal case.
Here is some context why I add std::dynamic_pointer_cast<const core::FieldAccessTypedExpr>(fieldExpr->inputs()[0]) == nullptr. I also think it is not normative.
There are two very similar cases.
First one is use row_constructor to create a row before ab.a. Second one is put cast and fieldAcess together.
## First case.
auto plan =
PlanBuilder()
.values({data})
.project(
{"cast(row_constructor(a, b) as row(a bigint, b bigint)) as ab"})
.project(
{"ab.a"})
.planNode();
## Second case.
auto plan =
PlanBuilder()
.values({data})
.project(
{"(cast(row_constructor(a, b) as row(a bigint, b bigint))).a"})
.planNode();
And their substrait plan are the same.
"direct_reference": {
"struct_field": {
"field": 0,
"child": {
"struct_field": {
"field": 0
}
}
}
But there is a difference for two cases that is the inputType for function getFieldIdForIntermediateNode.
First case inputType->toString() is ROW<ab:ROW<a:BIGINT,b:BIGINT>>.
But for the second case, I inputType->toString() is ROW<a:BIGINT,b:INTEGER,c:DOUBLE>. So I will got a null point core in getFieldIdForIntermediateNode function.
So, I think maybe the substrait plan of second case should like this.
"direct_reference": {
"struct_field": {
"field": 0,
}
}
which means we can ignore "cast" here. because we just want to find a from ROW<a:BIGINT,b:INTEGER,c:DOUBLE>. Do you think this is okay? if Ok, we should keep std::dynamic_pointer_cast<const core::FieldAccessTypedExpr>(fieldExpr->inputs()[0]) == nullptr logic. @mbasmanova
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I found a better way to fix this condition. here
Because only input of FieldAccessTypedExpr is not FieldAccessTypedExpr will return struct_field. other condition will not. So it can be consider as find exprName from input directly. Do you think this is okay?@mbasmanova
|
-- Some questions. |
1b2aeac to
e19fc41
Compare
|
@mbasmanova Just fixed all comments. |
e19fc41 to
8b8b886
Compare
8b8b886 to
18fae9d
Compare
|
Just warm reminder. @mbasmanova |
|
Hi, @mbasmanova do you have any suggestion? feel free to leave comments. 👍 |
562a8d8 to
c2d372a
Compare
|
I'm short on bandwidth and confused by the changes in the PR. @vibhatha Would you help review? |
|
Hi, @vibhatha do you have any suggestion? |
@mbasmanova I will definitely take a look. |
I will keep you posted. @zhejiangxiaomai |
Hi, @vibhatha please ping me. If you have any question. I will very happy to fix. |
Still looking into the code. Keep you posted as soon as possible. |
|
Hello,@vibhatha it's been a long time and I was wondering if you have any other suggestions? we already use this pr in our own repo. |
2. support nested struct. like(abc.ab.a) The idea is use nested struct field to represet FieldAccessTypedExpr.
6decec8 to
0954625
Compare
|
@mbasmanova I'm very sorry to bother you, do you have any other suggestions? It seems like @vibhatha is too busy to review this PR. |
|
@zhejiangxiaomai Substrait PRs are very confusing to me. My understanding is that Substrait is designed to represent a logical plan (an alternative to SQL) and is not designed to represent optimized physical plan. It feels a bit hacky to use Substrait for representing optimized physical plan. Why not generate Velox plan directly instead? Velox plans now support serialization. |
Thanks, I will discuss with rui-mo. |
cc:@FelixYBW |
|
@rui-mo Let's close this one since we moved this part to Gluten. |
This PR supports selecting a subfield from struct based on
FieldAccessTypedExpr. Added a round-trip test.