Skip to content

Conversation

@williamjallen
Copy link
Collaborator

Modern versions of Postgres have good hash index support, allowing us to efficiently perform equality comparisons on large text columns. Our current crc32-based hashing system suffers from hash conflicts, in addition to adding unnecessary and easily-broken logic. This PR replaces the crc32 column with a hash index and explicit equality comparison against all fields on insertion.

Copy link
Member

@josephsnyder josephsnyder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@williamjallen williamjallen added this pull request to the merge queue Jun 18, 2025
Merged via the queue into Kitware:master with commit b4617d6 Jun 18, 2025
7 checks passed
@williamjallen williamjallen deleted the note-crc32-removal branch June 18, 2025 18:35
github-merge-queue bot pushed a commit that referenced this pull request Aug 20, 2025
Following the success of #2943, this PR applies the same principle to
the `testoutput` table: remove the `crc32` column, add hash indexes, and
perform exact matching to deduplicate the table rather than using a
messy, low-entropy, application-layer hashing mechanism.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants