-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
The rate of inserting uint64
s into a hash set varies wildly with the order and the range of integers inserted
Example
# slow_set.nim
import sets
import times
when isMainModule:
var hs1: HashSet[uint64]
var hs2: HashSet[uint64]
var hs3: HashSet[uint64]
# insert 0..200k
var time = cpuTime()
for i in 0..200_000:
let k1 = uint64(i)
hs1.incl(k1)
echo "time ", (cpuTime() - time)
# interleave insert 0..100k and 100k..200k
time = cpuTime()
for i in 0..100_000:
let k1 = uint64(i)
let k2 = uint64(i + 100_000)
hs2.incl(k1)
hs2.incl(k2)
echo "time ", (cpuTime() - time)
# interleave insert 0..100k and 1.0M..1.1M
time = cpuTime()
for i in 0..100_000:
let k1 = uint64(i)
let k2 = uint64(i + 1_000_000)
hs3.incl(k1)
hs3.incl(k2)
echo "time ", (cpuTime() - time)
Current Output
Compiled with nim c -d:release -r slow_set.nim
.
time 0.016778
time 1.01831
time 14.043752
Expected Output
I would expect the three different insertion loops to take roughly the same amount of time.
They are all inserting the same amount of unique values.
In the second loop, I just interleave the insertion order, and in the third loop, I insert some larger numbers.
Possible Solution
Additional Information
This issues #10097 talks about integer hashing and collisions, but all of my uint64s are well below the max uint32.
This also happens with hashTable keys, presumably for similar reasons?
In my actual project I am deduplicating an edge list for a graph where the source and destination vertex for a node are sort of similar to loop 3, so the performance is not good.
$ nim -v
Nim Compiler Version 0.20.2 [MacOSX: amd64]
Compiled at 2019-07-17
Copyright (c) 2006-2019 by Andreas Rumpf
git hash: 88a0edba4b1a3d535b54336fd589746add54e937
active boot switches: -d:release