Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] add hash function for xx_hash3_64 #28910

Merged
merged 4 commits into from
Aug 17, 2023

Conversation

goldenbean
Copy link
Contributor

xx_hash3_64 has much better performance than murmur_hash3_32 by using AVX2 instruction according their benchmark, and has state-of-art hash quality which is broadly integrated in many software.

Reference: https://github.com/Cyan4973/xxHash

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.1
    • 3.0
    • 2.5
    • 2.4

@goldenbean goldenbean requested a review from a team as a code owner August 9, 2023 03:46
@wanpengfei-git wanpengfei-git added the documentation Improvements or additions to documentation label Aug 9, 2023
@imay
Copy link
Contributor

imay commented Aug 9, 2023

Did you test the performance based on this PR compared to murmur_hash3_32?

@goldenbean
Copy link
Contributor Author

Did you test the performance based on this PR compared to murmur_hash3_32?

Since the implementation is quite simple and straight forward, I'm referred to the official benchmarks among the popular algo in the code "be/src/util/xxhash.h" as following:

  • Benchmarks

  • The reference system uses an Intel i7-9700K CPU, and runs Ubuntu x64 20.04.
  • The open source benchmark program is compiled with clang v10.0 using -O3 flag.
  • | Hash Name | ISA ext | Width | Large Data Speed | Small Data Velocity |
  • | -------------------- | ------- | ----: | ---------------: | ------------------: |
  • | XXH3_64bits() | @b AVX2 | 64 | 59.4 GB/s | 133.1 |
  • | MeowHash | AES-NI | 128 | 58.2 GB/s | 52.5 |
  • | XXH3_128bits() | @b AVX2 | 128 | 57.9 GB/s | 118.1 |
  • | CLHash | PCLMUL | 64 | 37.1 GB/s | 58.1 |
  • | XXH3_64bits() | @b SSE2 | 64 | 31.5 GB/s | 133.1 |
  • | XXH3_128bits() | @b SSE2 | 128 | 29.6 GB/s | 118.1 |
  • | RAM sequential read | | N/A | 28.0 GB/s | N/A |
  • | ahash | AES-NI | 64 | 22.5 GB/s | 107.2 |
  • | City64 | | 64 | 22.0 GB/s | 76.6 |
  • | T1ha2 | | 64 | 22.0 GB/s | 99.0 |
  • | City128 | | 128 | 21.7 GB/s | 57.7 |
  • | FarmHash | AES-NI | 64 | 21.3 GB/s | 71.9 |
  • | XXH64() | | 64 | 19.4 GB/s | 71.0 |
  • | SpookyHash | | 64 | 19.3 GB/s | 53.2 |
  • | Mum | | 64 | 18.0 GB/s | 67.0 |
  • | CRC32C | SSE4.2 | 32 | 13.0 GB/s | 57.9 |
  • | XXH32() | | 32 | 9.7 GB/s | 71.9 |
  • | City32 | | 32 | 9.1 GB/s | 66.0 |
  • | Blake3* | @b AVX2 | 256 | 4.4 GB/s | 8.1 |
  • | Murmur3 | | 32 | 3.9 GB/s | 56.1 |
  • | SipHash* | | 64 | 3.0 GB/s | 43.2 |
  • | Blake3* | @b SSE2 | 256 | 2.4 GB/s | 8.1 |
  • | HighwayHash | | 64 | 1.4 GB/s | 6.0 |
  • | FNV64 | | 64 | 1.2 GB/s | 62.7 |
  • | Blake2* | | 256 | 1.1 GB/s | 5.1 |
  • | SHA1* | | 160 | 0.8 GB/s | 5.6 |
  • | MD5* | | 128 | 0.6 GB/s | 7.8 |

@imay
Copy link
Contributor

imay commented Aug 9, 2023

I have seen the benchmark result. It is better to do tests based on this PR to make sure better performance can even exist in starrocks.

@goldenbean
Copy link
Contributor Author

I have seen the benchmark result. It is better to do tests based on this PR to make sure better performance can even exist in starrocks.

Got you

@goldenbean goldenbean force-pushed the add-xxhash-func branch 2 times, most recently from 35ad058 to f6bb8c0 Compare August 11, 2023 01:19
@imay
Copy link
Contributor

imay commented Aug 11, 2023

@goldenbean can you share the performance results of this new function and murmur_hash3_32?

@CLAassistant
Copy link

CLAassistant commented Aug 13, 2023

CLA assistant check
All committers have signed the CLA.

@goldenbean
Copy link
Contributor Author

goldenbean commented Aug 13, 2023

UPDATED


Benchmark Time CPU Iterations

BM_HashFunctions_Eval/10/1/iterations:10000 910 ns 907 ns 10000
BM_HashFunctions_Eval/10/0/iterations:10000 376 ns 376 ns 10000
BM_HashFunctions_Eval/100/1/iterations:10000 2704 ns 2698 ns 10000
BM_HashFunctions_Eval/100/0/iterations:10000 2400 ns 2398 ns 10000
BM_HashFunctions_Eval/10000/1/iterations:10000 279077 ns 278694 ns 10000
BM_HashFunctions_Eval/10000/0/iterations:10000 128846 ns 128677 ns 10000
BM_HashFunctions_Eval/1000000/1/iterations:10000 28255555 ns 28208820 ns 10000
BM_HashFunctions_Eval/1000000/0/iterations:10000 15829049 ns 15800055 ns 10000

Hardware:
CPU: AMD Ryzen 5 PRO 4650U with Radeon Graphics
MEM: 24G

Run on (12 X 2100 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x6)
L1 Instruction 32 KiB (x6)
L2 Unified 512 KiB (x6)
L3 Unified 4096 KiB (x2)
Load Average: 3.21, 5.49, 7.78
WARNING CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

Signed-off-by: beans <jing.gao@outlook.com>
Signed-off-by: beans <jing.gao@outlook.com>
@imay
Copy link
Contributor

imay commented Aug 13, 2023

Form the benchmark, we can see the improvement. But the speed seems not fast enough as Reference: https://github.com/Cyan4973/xxHash. do you know the reason?

@goldenbean
Copy link
Contributor Author

@imay please let me do more tests on commercial server on Monday.

Signed-off-by: beans <jing.gao@outlook.com>
@goldenbean
Copy link
Contributor Author


Benchmark Time CPU Iterations

BM_HashFunctions_Eval/10/1/iterations:10000 722 ns 722 ns 10000
BM_HashFunctions_Eval/10/0/iterations:10000 609 ns 609 ns 10000
BM_HashFunctions_Eval/100/1/iterations:10000 3720 ns 3720 ns 10000
BM_HashFunctions_Eval/100/0/iterations:10000 2534 ns 2534 ns 10000
BM_HashFunctions_Eval/10000/1/iterations:10000 411227 ns 411221 ns 10000
BM_HashFunctions_Eval/10000/0/iterations:10000 220210 ns 220207 ns 10000
BM_HashFunctions_Eval/1000000/1/iterations:10000 41261672 ns 41261079 ns 10000
BM_HashFunctions_Eval/1000000/0/iterations:10000 23734210 ns 23733623 ns 10000

Hardware:
CPU: 64 core Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
MEM: 256 GB

Run on (64 X 2593.98 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x32)
L1 Instruction 32 KiB (x32)
L2 Unified 256 KiB (x32)
L3 Unified 40960 KiB (x2)
Load Average: 0.00, 0.02, 0.00


size_t size = columns[0]->size();
ColumnBuilder<TYPE_BIGINT> builder(size);
for (int row = 0; row < size; ++row) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to transform this row-priority two-layer loops into column-priority loops, compiler shall try best to auto vectorize column-priority implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch ! I will do it soon

@imay
Copy link
Contributor

imay commented Aug 16, 2023

@goldenbean better to rename it as xx_hash3_64 which will be consistent with murmur_hash3_32

@goldenbean goldenbean changed the title [Feature] add hash function for xxhash3_64 [Feature] add hash function for xx_hash3_64 Aug 16, 2023
Signed-off-by: beans <jing.gao@outlook.com>
@goldenbean
Copy link
Contributor Author

@goldenbean better to rename it as xx_hash3_64 which will be consistent with murmur_hash3_32

ok

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.20) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

@wanpengfei-git
Copy link
Collaborator

[FE PR Coverage Check]

😍 pass : 0 / 0 (0%)

@imay imay merged commit 058e08b into StarRocks:main Aug 17, 2023
@goldenbean
Copy link
Contributor Author

@imay @satanson Thanks guys ! I will make more contributions around bitmap or some other area ~~

Jay-ju pushed a commit to Jay-ju/starrocks that referenced this pull request Sep 7, 2023
xx_hash3_64 has much better performance than murmur_hash3_32 by using AVX2 instruction according their benchmark, and has state-of-art hash quality which is broadly integrated in many software.

Reference: https://github.com/Cyan4973/xxHash
---------

Signed-off-by: beans <jing.gao@outlook.com>
@rcauble
Copy link

rcauble commented Oct 6, 2023

Hi, can this be backported to 3.1 branch, please?

@imay
Copy link
Contributor

imay commented Oct 6, 2023

@mergify backport branch-3.1

@mergify
Copy link
Contributor

mergify bot commented Oct 6, 2023

backport branch-3.1

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Oct 6, 2023
xx_hash3_64 has much better performance than murmur_hash3_32 by using AVX2 instruction according their benchmark, and has state-of-art hash quality which is broadly integrated in many software.

Reference: https://github.com/Cyan4973/xxHash
---------

Signed-off-by: beans <jing.gao@outlook.com>
(cherry picked from commit 058e08b)
wanpengfei-git pushed a commit that referenced this pull request Oct 6, 2023
xx_hash3_64 has much better performance than murmur_hash3_32 by using AVX2 instruction according their benchmark, and has state-of-art hash quality which is broadly integrated in many software.

Reference: https://github.com/Cyan4973/xxHash
---------

Signed-off-by: beans <jing.gao@outlook.com>
(cherry picked from commit 058e08b)
@goldenbean goldenbean deleted the add-xxhash-func branch April 3, 2024 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants