Skip to content

Added new function to hash any PostgreSQL data type #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
1 commit merged into from
Jun 13, 2013

Conversation

ozgune
Copy link

@ozgune ozgune commented Jun 10, 2013

Added new function called hll_hash_any that can hash any PostgreSQL data type. Also added corresponding regression tests for hll_hash_any.

@ozgune
Copy link
Author

ozgune commented Jun 10, 2013

I added a new hashing function that resolves the PostgreSQL data type dynamically, and dispatches to the corresponding hash function.

I also measured the performance overhead of using this function over columns in the TPC-H data warehousing benchmark's lineitem table (scale = 1GB, 6M data points). I'm including the performance numbers here for documentation purposes. As a performance baseline, TPC-H query #1 completes in 8s on my machine.

Data type | Direct hash call | Any hash call
Text 3.3s 4.6s
Text (low cardinality) 6.4s 7.6s
Bigint 2.1s 3.4s
Integer 2.1s 3.4s

hll_hash_any therefore has a performance impact. Still, it can be used when the user doesn't know the underlying type in advance, or more importantly when the column data type isn't supported by one of the hash functions.

@ghost
Copy link

ghost commented Jun 12, 2013

@ozgune I'll take a look at this today. Thanks for the PR!

ghost pushed a commit that referenced this pull request Jun 13, 2013
Added new function to hash any PostgreSQL data type. (Closes #10)
@ghost ghost merged commit e512d00 into citusdata:master Jun 13, 2013
@ghost
Copy link

ghost commented Jun 13, 2013

Thanks, @ozgune!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant