You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(math): add χ² probability and convert EntropyReport to RandomnessReport
Introduce another randomness measure based on Chi Square probability by
using unblob-native's chi_square_probability function. This function
returns the Chi Square distribution probability.
Chi-square tests are effective for distinguishing compressed from
encrypted data because they evaluate the uniformity of byte
distributions more rigorously than Shannon entropy.
In compressed files, bytes often cluster around certain values due to
patterns that still exist (albeit less detectable), resulting in a
non-uniform distribution. Encrypted data, by contrast, exhibits nearly
perfect uniformity, as each byte value from 0–255 is expected to appear
with almost equal frequency, making it harder to detect any discernible
patterns.
The chi-square distribution is calculated for the stream of bytes in the
chunk and expressed as an absolute number and a percentage which
indicates how frequently a truly random sequence would exceed the value
calculated. The percentage is the only value that is of interest from
unblob's perspective, so that's why we only return it.
According to ent doc⁰:
> We [can] interpret the percentage as the degree to which the
> sequence tested is suspected of being non-random. If the percentage is
> greater than 99% or less than 1%, the sequence is almost certainly not
> random. If the percentage is between 99% and 95% or between 1% and 5%,
> the sequence is suspect. Percentages between 90% and 95% and 5% and 10%
> indicate the sequence is “almost suspect”.
[0] - https://www.fourmilab.ch/random/
This randomness measure is introduced by modifying the EntropyReport class
so that it contains two RandomnessMeasurements:
- shannon: for Shannon entropy, which was already there
- chi_square: for Chi Square probability, which we introduce
EntropyReport is renamed to RandomnessReport to reflect that all
measurements are not entropy related.
The format_entropy_plot has been adjusted to display two lines within
the entropy graph. One for Shannon, the other for Chi Square.
This commit breaks the previous API by converting
entropy_depth and entropy_plot to randomness_depth and randomness_plot
in ExtractionConfig. The '--entropy-depth' CLI option is replaced by
'--randomness-depth'.
0 commit comments