Skip to content

HIVE-29368: limiting the NDV fix to the CASE clause handler only#6308

Open
konstantinb wants to merge 2 commits intoapache:masterfrom
konstantinb:HIVE-29368-case-only
Open

HIVE-29368: limiting the NDV fix to the CASE clause handler only#6308
konstantinb wants to merge 2 commits intoapache:masterfrom
konstantinb:HIVE-29368-case-only

Conversation

@konstantinb
Copy link
Contributor

@konstantinb konstantinb commented Feb 9, 2026

What changes were proposed in this pull request?

HIVE-29368: perform more accurate NDV estimations for CASE/WHEN clauses with multiple constant branches

If the resulting NDV off PessimisticStatCombiner is smaller than the number of const branches, use the number of const branches as the expression NDV

Why are the changes needed?

The original code uses PessimisticStatCombiner to combine stats of all branches of a CASE/WHEN clause. Since every const branch has a natural NDV of 1, the resulting NDV estimate of even very large clauses is also 1 (PessimisticStatCombiner calculates MAX NDV of all branches). If such a column is subsequently used in a GROUP BY, especially multiple time, this under-estimation might lead to pretty bad exscution plan decisions. E.g. a 20x under-estimation alone could be not so bad, but when this happens 3x times in one query, then the underestimation of 111 = 1 could quickly grow up to 202020 = 8000x times row count underestimation

Does this PR introduce any user-facing change?

No

How was this patch tested?

See the .q file in the PR; also extensively tested in a private custom installation

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 9, 2026

@konstantinb
Copy link
Contributor Author

CC @okumin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants