Full Station Name Hash Requirement?

We need to discuss about the requirement of "full station name hash".

In (most of) my entries I use the "perfect hash" trick, i.e. only compare the 32-bit of the hash to check for a given station name.  With a good enough hash function (e.g. crc32c), it works perfectly fine with our current dataset of 10K stations, and give the correct output results. BUT we may be able to add a line to the dataset with a forged name triggering a hash collision. Then the results would be inaccurate...

In the original 1BRC challenge, this trick was disallowed, and they rejected any solution not explicitly comparing the station names char by char.
https://github.com/gunnarmorling/1brc/discussions/495#discussioncomment-8189362

So in my entry, I made this process flow available, and we can compare plain `./abouchez` and `./abouchez -f` - the later making a full name comparison, but lower (1.96s vs 1.10s on my Intel PC).

To be fair with the original comparison, I would recommend to *require* a full station name comparison.
It makes numbers lower, but is IMHO more accurate with what we expect on real work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Full Station Name Hash Requirement? #118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Full Station Name Hash Requirement? #118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions