Skip to content

Conversation

@rui-mo
Copy link
Collaborator

@rui-mo rui-mo commented Sep 2, 2022

This PR supports selecting a subfield from struct based on FieldAccessTypedExpr. Added a round-trip test.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 2, 2022
@netlify
Copy link

netlify bot commented Sep 2, 2022

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 0954625
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/64a38726154f4a0008fd9783

@rui-mo rui-mo marked this pull request as draft September 2, 2022 08:30
@rui-mo rui-mo marked this pull request as ready for review September 26, 2022 02:47
@zhejiangxiaomai
Copy link
Contributor

Hi, @mbasmanova Could you please take a look at this PR?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo @zhejiangxiaomai Some questions.

@zhejiangxiaomai
Copy link
Contributor

Hi, @mbasmanova sorry for late reply. Could you please review again when you are free?

@mbasmanova
Copy link
Contributor

@zhejiangxiaomai It has been a long time. CircleCI is failing. Would you take a look?

@zhejiangxiaomai
Copy link
Contributor

zhejiangxiaomai commented Oct 13, 2022

@zhejiangxiaomai It has been a long time. CircleCI is failing. Would you take a look?

Yes, I am fixing it. The reason why we didn't reply was that we had a seven day holiday called National Day.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhejiangxiaomai Some comments.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are open comments.

@zhejiangxiaomai zhejiangxiaomai force-pushed the sel_from_struct branch 2 times, most recently from 79d5418 to 155b45a Compare October 17, 2022 05:11
@kgpai
Copy link
Contributor

kgpai commented Jan 3, 2023

Hi @zhejiangxiaomai , Feel free to pull from latest and address open comments if still interested . Thanks !

@zhejiangxiaomai
Copy link
Contributor

Hi @zhejiangxiaomai , Feel free to pull from latest and address open comments if still interested . Thanks !

Thank you for reminding me. I am fixing some comments in this PR. I will pull latest commit for this PR.

@zhejiangxiaomai zhejiangxiaomai force-pushed the sel_from_struct branch 4 times, most recently from 363b974 to e2e5481 Compare January 9, 2023 09:24
@zhejiangxiaomai
Copy link
Contributor

Hi @mbasmanova, I passed the UT for nested struct. Would you review this PR again? It's been a long time.
The main idea of this PR is that use nested StructField to represent fieldExpr. StructField need record the index of each layer.
For example,
{ "field": 0, "child": { "struct_field": { "field": 0, "child": { "struct_field": { "field": 1 } } } } }
will represent ROW["abc"]["ab"]["b"].

@zhejiangxiaomai zhejiangxiaomai force-pushed the sel_from_struct branch 2 times, most recently from cb2ba77 to 1ee7455 Compare January 9, 2023 10:45
@mbasmanova
Copy link
Contributor

@zhejiangxiaomai It has been a very long time indeed. I have things I need to work on today and tomorrow. I'll try to get to this PR sometime later this week.

@zhejiangxiaomai
Copy link
Contributor

@zhejiangxiaomai Thank you. I have a few things I need to work on this week. I'll take a look early next week. Please, ping me if you don't hear from me by Tue morning.

Just warm reminder. @mbasmanova

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions.

}

inputColumnType = asRowType(inputColumnType->childAt(idx));
tmp = &tmp->child().struct_field();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is tmp->child() guaranteed to be a direct reference? Where can I read about direct_reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. struct_field() will return class StructField. StructField link
  2. StructField child() will return ReferenceSegment struct_field.child() link
  3. ReferenceSegment will represent one type of MapKey StructField ListElement. ReferenceSegment define
  4. direct_reference is also ReferenceSegment. direct_reference link

// for a leaf in an expression, find idx from child by exprName.
// for a dereference expression, find idx from every child by exprName.
if (fieldExpr->isInputColumn() ||
std::dynamic_pointer_cast<const core::FieldAccessTypedExpr>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need special processing for the case when input of FieldAccessTypedExpr is not FieldAccessTypedExpr? I expect we should process all cases where input is not null the same:

1- Convert input to substrait expression.
2- Wrap the converted expression in substrait dereference expression.

Do we have tests for both code paths ?

Copy link
Contributor

@zhejiangxiaomai zhejiangxiaomai Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

---- Do we have tests for both code paths ?
Yes, we have tests for both code paths.

  • case (cast(row_constructor(a, b) as row(a bigint, b bigint))).a is the case when input of FieldAccessTypedExpr is not FieldAccessTypedExpr.
  • case "(abc).ab.a" is the normal case.

Here is some context why I add std::dynamic_pointer_cast<const core::FieldAccessTypedExpr>(fieldExpr->inputs()[0]) == nullptr. I also think it is not normative.

There are two very similar cases.
First one is use row_constructor to create a row before ab.a. Second one is put cast and fieldAcess together.

  ## First case.
  auto plan =
      PlanBuilder()
          .values({data})
          .project(
              {"cast(row_constructor(a, b) as row(a bigint, b bigint)) as ab"})
          .project(
              {"ab.a"})
          .planNode();

  ## Second case.
  auto plan =
      PlanBuilder()
          .values({data})
          .project(
              {"(cast(row_constructor(a, b) as row(a bigint, b bigint))).a"})
          .planNode();

And their substrait plan are the same.

 "direct_reference": {
  "struct_field": {
   "field": 0,
   "child": {
    "struct_field": {
     "field": 0
    }
   }
  }

But there is a difference for two cases that is the inputType for function getFieldIdForIntermediateNode.
First case inputType->toString() is ROW<ab:ROW<a:BIGINT,b:BIGINT>>.
But for the second case, I inputType->toString() is ROW<a:BIGINT,b:INTEGER,c:DOUBLE>. So I will got a null point core in getFieldIdForIntermediateNode function.

So, I think maybe the substrait plan of second case should like this.

 "direct_reference": {
  "struct_field": {
   "field": 0,
   }
  }

which means we can ignore "cast" here. because we just want to find a from ROW<a:BIGINT,b:INTEGER,c:DOUBLE>. Do you think this is okay? if Ok, we should keep std::dynamic_pointer_cast<const core::FieldAccessTypedExpr>(fieldExpr->inputs()[0]) == nullptr logic. @mbasmanova

Copy link
Contributor

@zhejiangxiaomai zhejiangxiaomai Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I found a better way to fix this condition. here

Because only input of FieldAccessTypedExpr is not FieldAccessTypedExpr will return struct_field. other condition will not. So it can be consider as find exprName from input directly. Do you think this is okay?@mbasmanova

@zhejiangxiaomai
Copy link
Contributor

-- Some questions.
Thanks, I'll give some serious thought to these comments tomorrow.

@zhejiangxiaomai
Copy link
Contributor

zhejiangxiaomai commented Apr 20, 2023

@mbasmanova Just fixed all comments.

@zhejiangxiaomai
Copy link
Contributor

Just warm reminder. @mbasmanova

@zhejiangxiaomai
Copy link
Contributor

Hi, @mbasmanova do you have any suggestion? feel free to leave comments. 👍

@mbasmanova
Copy link
Contributor

I'm short on bandwidth and confused by the changes in the PR.

@vibhatha Would you help review?

@zhejiangxiaomai
Copy link
Contributor

Hi, @vibhatha do you have any suggestion?

@vibhatha
Copy link
Contributor

vibhatha commented May 4, 2023

I'm short on bandwidth and confused by the changes in the PR.

@vibhatha Would you help review?

@mbasmanova I will definitely take a look.

@vibhatha
Copy link
Contributor

vibhatha commented May 4, 2023

Hi, @vibhatha do you have any suggestion?

I will keep you posted. @zhejiangxiaomai

@zhejiangxiaomai
Copy link
Contributor

Hi, @vibhatha do you have any suggestion?

I will keep you posted. @zhejiangxiaomai

Hi, @vibhatha please ping me. If you have any question. I will very happy to fix.

@vibhatha
Copy link
Contributor

Hi, @vibhatha do you have any suggestion?

I will keep you posted. @zhejiangxiaomai

Hi, @vibhatha please ping me. If you have any question. I will very happy to fix.

Still looking into the code. Keep you posted as soon as possible.

@zhejiangxiaomai
Copy link
Contributor

Hello,@vibhatha it's been a long time and I was wondering if you have any other suggestions? we already use this pr in our own repo.

2. support nested struct. like(abc.ab.a)
The idea is use nested struct field to represet FieldAccessTypedExpr.
@zhejiangxiaomai
Copy link
Contributor

@mbasmanova I'm very sorry to bother you, do you have any other suggestions? It seems like @vibhatha is too busy to review this PR.

@mbasmanova
Copy link
Contributor

@zhejiangxiaomai Substrait PRs are very confusing to me. My understanding is that Substrait is designed to represent a logical plan (an alternative to SQL) and is not designed to represent optimized physical plan. It feels a bit hacky to use Substrait for representing optimized physical plan. Why not generate Velox plan directly instead? Velox plans now support serialization.

@zhejiangxiaomai
Copy link
Contributor

@zhejiangxiaomai Substrait PRs are very confusing to me. My understanding is that Substrait is designed to represent a logical plan (an alternative to SQL) and is not designed to represent optimized physical plan. It feels a bit hacky to use Substrait for representing optimized physical plan. Why not generate Velox plan directly instead? Velox plans now support serialization.

Thanks, I will discuss with rui-mo.

@zhejiangxiaomai
Copy link
Contributor

@zhejiangxiaomai Substrait PRs are very confusing to me. My understanding is that Substrait is designed to represent a logical plan (an alternative to SQL) and is not designed to represent optimized physical plan. It feels a bit hacky to use Substrait for representing optimized physical plan. Why not generate Velox plan directly instead? Velox plans now support serialization.

cc:@FelixYBW

@FelixYBW
Copy link

FelixYBW commented Aug 2, 2023

@rui-mo Let's close this one since we moved this part to Gluten.

@rui-mo rui-mo closed this Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants