Skip to content

get rid of std::hash #11591

@lll-phill-lll

Description

@lll-phill-lll

As was noted by @vladl2802 in #11416 we have non even distribution of values between buckets while spilling.

The root couse of it is that we rely on a hash function here:

auto bucketId = hash % SpilledBucketCount;

which appears to be std::hash which just returns the value itself: https://godbolt.org/z/es8dxMGeY

Hash function is set here:

return std::hash<T>()(value.Get<T>());

As a temp measure we change the algorithm of bucket selection from hash%128 to XXHASH(hash)%128. pr: #11471

Also, with std::hash we can face compatibility issues while changing MKQL_RUNTIME version.

So, the proposal of this task is to change std::hash to some other hash function. Hash functions to consider:
rh hash:

ui64 bucket = ((SelfHash ^ hash) * 11400714819323198485llu) >> capacityShift;

xxhash: https://github.com/Cyan4973/xxHash. We already use xxhash in GraceJoin:
XXH64_hash_t hash = XXH64(TempTuple.data() + NullsBitmapSize_, (TempTuple.size() - NullsBitmapSize_) * sizeof(ui64), 0);

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions