Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better multi-column aggregation support with StringView #11794

Closed
wants to merge 1 commit into from

Conversation

XiangpengHao
Copy link
Contributor

Which issue does this PR close?

Related to #7000. Previous pr was automatically closed because the string-view2 branch is merged.

Rationale for this change

I get some time to implement the multi-column aggregation with StringView, the implementation is inspired by @jayzhan211 's #10976.

However, the performance is mixed, I see almost no performance diff with ClickBench query 18:

SELECT "UserID", extract(minute FROM to_timestamp_seconds("EventTime")) AS m, "SearchPhrase", COUNT(*) FROM hits GROUP BY "UserID", m, "SearchPhrase" ORDER BY COUNT(*) DESC LIMIT 10;

This is becuase the average size of SearchPhrase is only 4, meaning the string is always inlined to view and thus not being reused.

However, I made this query (q14, changed SearchPhrase to URL)

SELECT "SearchEngineID", "URL", COUNT(*) AS c FROM hits WHERE "URL" <> '' GROUP BY "SearchEngineID", "URL" ORDER BY c DESC LIMIT 10;

Without this patch, it takes 7.4s; with this patch, it takes 5.9s, a 20% improvement.

With this patch, the output string will reuse the string values in the buffer, so it is good.
However, when interning the values, we need to hash twice: (1) the string itself to build BytesViewMap (2) the group index - u32, which adds overhead.

What changes are included in this PR?

Things not implemented yet:

  • EmtiTo::First(n): I need to think a bit more about how to efficiently reuse the BytesViewBuilder

Are these changes tested?

Are there any user-facing changes?

No

Copy link

github-actions bot commented Oct 6, 2024

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Oct 6, 2024
@github-actions github-actions bot closed this Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant