You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The state of an HyperLogLog sketch with precision parameter $p$ requires $0.75 \cdot m = 0.75 \cdot 2^p$ bytes where $m$ denotes the number of registers.
The expected relative standard error is approximately given by $\frac{1.039}{\sqrt{m}}$,$\frac{1.037}{\sqrt{m}}$,
and $\frac{0.833}{\sqrt{m}}$ for the default, the maximum-likelihood (ML), and the martingale estimator, respectively.
This is a good approximation for all $p\geq 6$ and large distinct counts.
However, the error is significantly smaller for distinct counts that are in the order of $m$ or smaller.
The bias is always much smaller than the root-mean-square error (rmse) and can therefore be neglected.
The following charts show the empirically determined relative error as a function of the true distinct count for various precision parameters $p$ based on 100k simulation runs. Distinct counts up to 1M were simulated by generating random values as hash values. For distinct counts above 1M, a different technique is used by randomly generating only hash values at distinct counts that can actually change the state of the data structure.