Closed
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Queries like this (which compute distinct values for high cardinality columns) are currently relatively slow (if there are many values of UserID
):
SELECT
SUM(EngineId),
COUNT(*) AS c,
COUNT(DISTINCT "UserID")
FROM
hits
GROUP BY
"RegionID"
ORDER BY
c DESC
LIMIT 10;
Here is a specific clickbench query from the discussion on #5276
❯ SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" ORDER BY c DESC LIMIT 10;
Describe the solution you'd like
We could make this type of query faster. Hopefully we can collect ideas here
Describe alternatives you've considered
TBD
Additional context
There are thoughts on improving aggregate performance in general #4973
This is one area where clickhouse and duckdb are particularly strong
See #5276 and #5276 (comment)