Skip to content

Latest commit

 

History

History
24 lines (22 loc) · 2.17 KB

hyperloglog-estimation-error.md

File metadata and controls

24 lines (22 loc) · 2.17 KB

HyperLogLog estimation error

The state of an HyperLogLog sketch with precision parameter $p$ requires $0.75 \cdot m = 0.75 \cdot 2^p$ bytes where $m$ denotes the number of registers. The expected relative standard error is approximately given by $\frac{1.039}{\sqrt{m}}$,$\frac{1.037}{\sqrt{m}}$, and $\frac{0.833}{\sqrt{m}}$ for the default, the maximum-likelihood (ML), and the martingale estimator, respectively. This is a good approximation for all $p\geq 6$ and large distinct counts. However, the error is significantly smaller for distinct counts that are in the order of $m$ or smaller. The bias is always much smaller than the root-mean-square error (rmse) and can therefore be neglected. The following charts show the empirically determined relative error as a function of the true distinct count for various precision parameters $p$ based on 100k simulation runs. Distinct counts up to 1M were simulated by generating random values as hash values. For distinct counts above 1M, a different technique is used by randomly generating only hash values at distinct counts that can actually change the state of the data structure.