Skip to content

Conversation

@zabetak
Copy link
Member

@zabetak zabetak commented Oct 7, 2025

What changes were proposed in this pull request?

  1. Use introduceDerivedTable(RelNode) instead of introduceDerivedTable(RelNode,RelNode) to avoid the exception when the (left) input is already modified by the shuttle.
  2. Manually link the derived table with the join in SelfJoinHandler.
  3. Create new join (copy) only if one of the input(s) is modified.

Why are the changes needed?

The problem occurs when we need to introduce a derived table over the left input of the join but the input is already modified (newL) by the shuttle. Looking for the modified input in the join's children fails and raises an exception. Since we know which input needs to be replaced we can avoid the lookup mechanism in introduceDerivedTable(RelNode, RelNode) and build directly the new join operator via copy.

To avoid unnecessary object creation bubling further up via the HiveRelShuttle mechansim we perform the copy only when one of the inputs is modified.

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex=.*cte.* -Dtest.output.overwrite
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex=.*view.* -Dtest.output.overwrite

@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 7, 2025

…edTable for queries with self joins

The problem occurs when we need to introduce a derived table over the left input of the join but the input is already modified (newL) by the shuttle.
Looking for the modified input in the join's children fails and raises an exception. Since we know which input needs to be replaced we can avoid the lookup mechanism in
`introduceDerivedTable(RelNode, RelNode)` and build directly the new join operator via copy.

To avoid unnecessary object creation bubling further up via the HiveRelShuttle mechansim we perform the copy *only* when one of the inputs is modified.
@sonarqubecloud
Copy link

@zabetak
Copy link
Member Author

zabetak commented Nov 14, 2025

@kasakrisz This is a follow-up from HIVE-28222 that you reviewed previously so if you have time please take a look.

@zabetak zabetak requested a review from kasakrisz November 14, 2025 09:47
Copy link
Contributor

@kasakrisz kasakrisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@zabetak zabetak merged commit c68e7ec into apache:master Nov 17, 2025
4 checks passed
@zabetak zabetak deleted the HIVE-29249 branch November 17, 2025 13:35
@zabetak
Copy link
Member Author

zabetak commented Nov 17, 2025

Thanks for the review @kasakrisz !

deniskuzZ pushed a commit that referenced this pull request Nov 17, 2025
…edTable for queries with self joins (#6117)

The problem occurs when we need to introduce a derived table over the left input of the join but the input is already modified (newL) by the shuttle.
Looking for the modified input in the join's children fails and raises an exception. Since we know which input needs to be replaced we can avoid the lookup mechanism in
`introduceDerivedTable(RelNode, RelNode)` and build directly the new join operator via copy.

To avoid unnecessary object creation bubling further up via the HiveRelShuttle mechansim we perform the copy *only* when one of the inputs is modified.

(cherry picked from commit c68e7ec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants