Skip to content

[BUG] Incorrect results when doing join with aggregation and special project #4540

@qianheng-aws

Description

@qianheng-aws

What is the bug?
Incorrect results when doing join with aggregation, while the join key is not the first group by key. And we also need a special project like group by keys + agg metrics before join to trigger this bug.

How can one reproduce the bug?

PUT /test_bool_join
{
  "mappings": {
    "properties": {
      "category": { "type": "keyword" },
      "has_flag": { "type": "boolean" },
      "value": { "type": "integer" }
    }
  }
}
POST test_bool_join/_bulk
{"index":{"_id":"1"}}
{"category":"A","has_flag":true,"value":10}
{"index":{"_id":"2"}}
{"category":"B","has_flag":true,"value":20}
{"index":{"_id":"3"}}
{"category":"C","has_flag":true,"value":30}
{"index":{"_id":"4"}}
{"category":"D","has_flag":false,"value":40}
{"index":{"_id":"5"}}
{"category":"E","has_flag":false,"value":50}
{"index":{"_id":"6"}}
{"category":"F","has_flag":false,"value":60}


POST _plugins/_ppl/_explain
{
  "query": """
source=test_bool_join 
| stats COUNT() as cnt by category, has_flag
| fields category, has_flag, cnt 
| join left=L right=R ON L.has_flag = R.has_flag 
  [source=test_bool_join | stats COUNT() as overall_cnt by has_flag]
  """
}

It get incorrect results of 3 rows while the correct results is 6 rows.

What is the expected behavior?
Get correct results with 6 rows.

What is your host/environment?

  • OS: macos
  • Version 3.3.0-snapshot
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
This query without the special project could produce correct results.

source=test_join 
| stats COUNT() as cnt by category, has_flag
| join left=L right=R ON L.has_flag = R.has_flag 
  [source=test_join | stats COUNT() as overall_cnt by has_flag]

And single query having this special project but without join could produce correct results.

source=test_bool_join 
| stats COUNT() as cnt by category, has_flag
| fields category, has_flag, cnt
| sort has_flag

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcalcitecalcite migration releatedpushdownpushdown related issues

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions