MDEV-31669: use XXH3 as a digest instead of md5#4573
MDEV-31669: use XXH3 as a digest instead of md5#4573MohamedM216 wants to merge 1 commit intoMariaDB:mainfrom
Conversation
gkodinov
left a comment
There was a problem hiding this comment.
This is a preliminary review. Right now there's no strong opposition against removing MD5 from this. But the choice of a replacement hash function is yet to be determined. I can see the logic in Sergei's argument that using a crypto hash for a non-crypto purpose is kind of a mismatch. But I will leave that discussion for the final review now.
First of all: please make sure the change compiles and runs all buildbot test in a satisfactory way.
899f3b3 to
3e7b460
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Please switch this to XXH3, as suggested by Sergei. I will then do a full review of the result once this is done.
3e7b460 to
241cb0c
Compare
241cb0c to
ea765b3
Compare
|
Apologies, I accidentally closed this PR. |
gkodinov
left a comment
There was a problem hiding this comment.
Thanks for switching to XXhash.
OK to push from me, given that you add a commit message to the commit that follows CODING_STANDARDS.md.
Please wait for the final review.
ea765b3 to
233436e
Compare
|
Hi @gkodinov , I've updated the commit message. Thanks! |
vuvova
left a comment
There was a problem hiding this comment.
the main comment here: I've looked at xxhash.h that you're using and noticed that it has 128-bit hash. This is the same width as MD5 so less changes and also same chance of collison, 64-bit hash has it much higher.
could you try with XXH3_128bit please?
mysql-test/include/check_digest.inc
Outdated
|
|
||
| --disable_query_log | ||
| create table test._digests(d varchar(32) primary key); | ||
| create table test._digests(d varchar(16) primary key); |
There was a problem hiding this comment.
let's keep it at 32 here. It'd be good to know that existing applications (that query and, perhaps, temporarily store these digests) will continue to work
sql/sql_digest.cc
Outdated
| compute_md5_hash(md5, | ||
| (const char *) digest_storage->m_token_array, | ||
| digest_storage->m_byte_count); | ||
| XXH64_hash_t res = XXH3_64bits(digest_storage->m_token_array, |
There was a problem hiding this comment.
I wonder, perhaps we should use XXH3_128bits? it exists in the same file, so as easy to use. but has the same width as md5, so less changes and more importantly same chance of collisions as before.
|
OK, I'll work on that. Thanks for the feedback! |
8b3c053 to
bc14604
Compare
- Following MySQL implementation as mentioned in the PR description - XXH is significantly faster than both MD5 and SHA-256 - Here we use the 128-bit XXH3 - Keep DIGEST_HASH_SIZE at 16 bytes (same as MD5) - Reduce COL_DIGEST_SIZE to 32 chars - Keep DIGEST columns as VARCHAR(32)
bc14604 to
8ad7bd6
Compare
Jira Issue number for this PR: MDEV-31669
I followed mysql implementation as suggested in jira.