Skip to content

xnought/hyperloglog

Repository files navigation

hyperloglog

HyperLogLog algorithm from scratch. Count the cardinality of an array with with basically no memory usage.

You could get perfect count, but it takes a lot of memory (hashmap). Instead, leverage statistics with hyperloglog.

from hyperloglog import hyperloglog

estimated_num_elements = hyperloglog(array)

Check out the example.ipynb for a real example on a 100 million elements array with around 10 million unique elements.

Or check out the pandas_example.ipynb for the pandas example that uses the approx_count_distinct DuckDB function under the hood (a superior implementation to mine).

About

HyperLogLog algorithm from scratch

Resources

Stars

Watchers

Forks