Description
We need to discuss about the requirement of "full station name hash".
In (most of) my entries I use the "perfect hash" trick, i.e. only compare the 32-bit of the hash to check for a given station name. With a good enough hash function (e.g. crc32c), it works perfectly fine with our current dataset of 10K stations, and give the correct output results. BUT we may be able to add a line to the dataset with a forged name triggering a hash collision. Then the results would be inaccurate...
In the original 1BRC challenge, this trick was disallowed, and they rejected any solution not explicitly comparing the station names char by char.
gunnarmorling/1brc#495 (reply in thread)
So in my entry, I made this process flow available, and we can compare plain ./abouchez
and ./abouchez -f
- the later making a full name comparison, but lower (1.96s vs 1.10s on my Intel PC).
To be fair with the original comparison, I would recommend to require a full station name comparison.
It makes numbers lower, but is IMHO more accurate with what we expect on real work.