Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in filtered aggregate for double columns #17455

Open
pjain1 opened this issue Nov 6, 2024 · 2 comments
Open

Regression in filtered aggregate for double columns #17455

pjain1 opened this issue Nov 6, 2024 · 2 comments

Comments

@pjain1
Copy link
Member

pjain1 commented Nov 6, 2024

Affected Version

Druid 29.0.1

Description

SELECT
  (TIME_FLOOR("__time", 'P1D', NULL, 'Asia/Calcutta')) AS "__time",
  SUM(CASE WHEN "countryIsoCode" IN ('US') THEN added  ELSE 0 END) AS "CASE_LONG",
  SUM(CASE WHEN "countryIsoCode" IN ('US') THEN CAST(added as double)  ELSE 0.0 END) AS "CASE_DOUBLE"
FROM "wikipedia"
WHERE ("__time" >= '2024-10-23T09:00:00.000Z' AND "__time" < '2024-10-30T11:00:00.000Z')
GROUP BY 1
ORDER BY "__time"
LIMIT 10001

If we check the query plan for above query, CASE_LONG aggregate gets converted to a filtered aggregator where as CASE_DOUBLE gets translated to a virtual column with case_searched expression which impacts the performance. Performance wise there is a big difference for real use case if there are multiple dimension values to check. Filtered aggregator is much faster than virtual column based aggregate with case_searched expression.

@kgyrtkirk
Copy link
Member

is this really a regression? did it worked in earlier versions - if yes in which version?

  • A way to fix this in the logical optimizer way could be to recognizing the fact that CAST can be pulled out from CASE and then the same for SUM ...but I feel like the above could only be a rare hit.
  • on the other hand it seems like AggregateCaseToFilterRule only consideres integer types ; right now I don't see any reason why that can't apply to double typed expressions

I believe in 31 case_long will also retain the case_searched due to a logic issue #17378

@pjain1
Copy link
Member Author

pjain1 commented Nov 11, 2024

I mean yeah it may not be called a regression but may be called an optimization issue. I think both cases should be converted to a filtered agg and yes as you pointed out in Druid 31 even the long case is not getting optimized, so it may be considered a regression in 31. I figured its a calcite rule issue thanks for confirming.

Can #17378 be modified to support optimizing for double type as well ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants