Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug-fix] Fix wrong data distribution judgment #6029

Merged
merged 5 commits into from
Jun 18, 2021

Conversation

EmmyMiao87
Copy link
Contributor

@EmmyMiao87 EmmyMiao87 commented Jun 15, 2021

Proposed changes

The Fragment where OlapScanNode is located has two data distribution possibilities.

  1. RANDOM: Involving multi-partitioned tables in OlapScanNode.
  2. HASH_PARTITIONED: The involving table is in the colocate group.

For a multi-partition table, although the data in each individual partition is distributed according to the bucketing column,
the same bucketing column between different partitions is not necessarily in the same be.
So the data distribution is RANDOM.

If Doris wrongly plan RANDOM as HASH_PARTITIONED, it will lead to the wrong colocate agg node.
The result of query is incorrect.

Fixed #6028

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
  • Code refactor (Modify the code structure, format the code, etc...)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have created an issue on (Fix [Bug] Duplicate results in 'Group By' query #6028 ) and described the bug/feature there in detail
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • If these changes need document changes, I have updated the document
  • Any dependent changes have been merged

The Fragment where OlapScanNode is located has three data distribution possibilities.
1. UNPARTITIONED: The scan range of OlapScanNode contains only one instance(BE)
2. RANDOM: Involving multi-partitioned tables in OlapScanNode.
3. HASH_PARTITIONED: The involving table is in the colocate group.

For a multi-partition table, although the data in each individual partition is distributed according to the bucketing column,
the same bucketing column between different partitions is not necessarily in the same be.
So the data distribution is RANDOM.

If Doris wrongly plan RANDOM as HASH_PARTITIONED, it will lead to the wrong colocate agg node.
The result of query is incorrect.

Fixed apache#6028
@EmmyMiao87 EmmyMiao87 added area/colocated Issues or PRs related to colocated tables area/planner Issues or PRs related to the query planner kind/fix Categorizes issue or PR as related to a bug. labels Jun 15, 2021
morningman
morningman previously approved these changes Jun 16, 2021
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman added the approved Indicates a PR has been approved by one committer. label Jun 17, 2021
@yangzhg yangzhg merged commit 99d8110 into apache:master Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/colocated Issues or PRs related to colocated tables area/planner Issues or PRs related to the query planner kind/fix Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Duplicate results in 'Group By' query
3 participants