forked from apache/doris
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Optimization](String) Optimize q20 q21 q22 q23 LIKE_SUBSTRING (like …
…'%xxx%') (apache#18309) Optimize q20, q21, q22, q23 LIKE_SUBSTRING (like '%xxxx%'). Idea is from clickhouse stringsearcher: Stringsearcher is about 10%~20% faster than volnitsky algorithm when needle size is less than 10 using two chars at beginning search in SIMD . Stringsearcher is faster than volnitsky algorithm, when needle size is less than 21. The changes are as follows: Using first two chars of needle at beginning search. We can compare two chars of needle and [n:n+17) chars in haystack in SIMD in one loop. Filter efficiency will be higher. When env support SIMD, we use stringsearcher. Test result in clickbench: q20 is about 15% up. q20: SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%'; q21, q22 is about 1%~5% up. q21: SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10; q22: SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT(*) AS c, COUNT(DISTINCT UserID) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10; q23 is about 30%~40% up and not stable. q23: SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;
- Loading branch information
1 parent
eb6dbc0
commit b627088
Showing
2 changed files
with
61 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters