You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SELECT
(TIME_FLOOR("__time", 'P1D', NULL, 'Asia/Calcutta')) AS "__time",
SUM(CASE WHEN "countryIsoCode" IN ('US') THEN added ELSE 0 END) AS "CASE_LONG",
SUM(CASE WHEN "countryIsoCode" IN ('US') THEN CAST(added as double) ELSE 0.0 END) AS "CASE_DOUBLE"
FROM "wikipedia"
WHERE ("__time" >= '2024-10-23T09:00:00.000Z' AND "__time" < '2024-10-30T11:00:00.000Z')
GROUP BY 1
ORDER BY "__time"
LIMIT 10001
If we check the query plan for above query, CASE_LONG aggregate gets converted to a filtered aggregator where as CASE_DOUBLE gets translated to a virtual column with case_searched expression which impacts the performance. Performance wise there is a big difference for real use case if there are multiple dimension values to check. Filtered aggregator is much faster than virtual column based aggregate with case_searched expression.
The text was updated successfully, but these errors were encountered:
is this really a regression? did it worked in earlier versions - if yes in which version?
A way to fix this in the logical optimizer way could be to recognizing the fact that CAST can be pulled out from CASE and then the same for SUM ...but I feel like the above could only be a rare hit.
on the other hand it seems like AggregateCaseToFilterRule only consideres integer types ; right now I don't see any reason why that can't apply to double typed expressions
I believe in 31 case_long will also retain the case_searched due to a logic issue #17378
I mean yeah it may not be called a regression but may be called an optimization issue. I think both cases should be converted to a filtered agg and yes as you pointed out in Druid 31 even the long case is not getting optimized, so it may be considered a regression in 31. I figured its a calcite rule issue thanks for confirming.
Can #17378 be modified to support optimizing for double type as well ?
Affected Version
Druid 29.0.1
Description
If we check the query plan for above query,
CASE_LONG
aggregate gets converted to a filtered aggregator where asCASE_DOUBLE
gets translated to a virtual column withcase_searched
expression which impacts the performance. Performance wise there is a big difference for real use case if there are multiple dimension values to check. Filtered aggregator is much faster than virtual column based aggregate withcase_searched
expression.The text was updated successfully, but these errors were encountered: