You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(math): introduce Chi Square entropy in EntropyReport
Introduce another entropy measure based on Chi Square probability by
using unblob-native's chi_square_probability function. This function
returns the Chi Square distribution probability.
Chi-square tests are effective for distinguishing compressed from
encrypted data because they evaluate the uniformity of byte
distributions more rigorously than Shannon entropy.
In compressed files, bytes often cluster around certain values due to
patterns that still exist (albeit less detectable), resulting in a
non-uniform distribution. Encrypted data, by contrast, exhibits nearly
perfect uniformity, as each byte value from 0–255 is expected to appear
with almost equal frequency, making it harder to detect any discernible
patterns.
The chi-square distribution is calculated for the stream of bytes in the
chunk and expressed as an absolute number and a percentage which
indicates how frequently a truly random sequence would exceed the value
calculated. The percentage is the only value that is of interest from
unblob's perspective, so that's why we only return it.
According to ent doc⁰:
> We [can] interpret the percentage as the degree to which the
> sequence tested is suspected of being non-random. If the percentage is
> greater than 99% or less than 1%, the sequence is almost certainly not
> random. If the percentage is between 99% and 95% or between 1% and 5%,
> the sequence is suspect. Percentages between 90% and 95% and 5% and 10%
> indicate the sequence is “almost suspect”.
[0] - https://www.fourmilab.ch/random/
This entropy measure is introduced by modifying the EntropyReport class
so that it contains two EntropyMeasures:
- shannon: for Shannon entropy, which was already there
- chi_square: for Chi Square entropy, which we introduce
The format_entropy_plot has been adjusted to display two lines within
the entropy graph. One for Shannon, the other for Chi Square.
0 commit comments