Skip to content

Commit 5bec244

Browse files
authored
Merge pull request #995 from onekey-sec/feat-chisquare-entropy
Compute and expose χ² probability in EntropyReport
2 parents c26717f + 8e2e11b commit 5bec244

File tree

14 files changed

+249
-193
lines changed

14 files changed

+249
-193
lines changed

docs/guide.md

Lines changed: 53 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -114,10 +114,10 @@ $ cat alpine-report.json
114114
]
115115
```
116116

117-
### Entropy calculation
117+
### Randomness calculation
118118

119119
If you are analyzing an unknown file format, it might be useful to know the
120-
entropy of the contained files, so you can quickly see for example whether the
120+
randomness of the contained files, so you can quickly see for example whether the
121121
file is **encrypted** or contains some random content.
122122

123123
Let's make a file with fully random content at the start and end:
@@ -128,59 +128,61 @@ $ dd if=/dev/random of=random2.bin bs=10M count=1
128128
$ cat random1.bin alpine-minirootfs-3.16.1-x86_64.tar.gz random2.bin > unknown-file
129129
```
130130

131-
A nice ASCII entropy plot is drawn on verbose level 3:
131+
A nice ASCII randomness plot is drawn on verbose level 3:
132132

133133
```console
134134
$ unblob -vvv unknown-file | grep -C 15 "Entropy distribution"
135135

136-
2022-07-30 07:58.16 [debug ] Ended searching for chunks all_chunks=[0xa00000-0xc96196] pid=19803
137-
2022-07-30 07:58.16 [debug ] Removed inner chunks outer_chunk_count=1 pid=19803 removed_inner_chunk_count=0
138-
2022-07-30 07:58.16 [warning ] Found unknown Chunks chunks=[0x0-0xa00000, 0xc96196-0x1696196] pid=19803
139-
2022-07-30 07:58.16 [info ] Extracting unknown chunk chunk=0x0-0xa00000 path=unknown-file_extract/0-10485760.unknown pid=19803
140-
2022-07-30 07:58.16 [debug ] Carving chunk path=unknown-file_extract/0-10485760.unknown pid=19803
141-
2022-07-30 07:58.16 [debug ] Calculating entropy for file path=unknown-file_extract/0-10485760.unknown pid=19803 size=0xa00000
142-
2022-07-30 07:58.16 [debug ] Entropy calculated highest=99.99 lowest=99.98 mean=99.98 pid=19803
143-
2022-07-30 07:58.16 [warning ] Drawing plot pid=19803
144-
2022-07-30 07:58.16 [debug ] Entropy chart chart=
145-
Entropy distribution
146-
┌---------------------------------------------------------------------------┐
147-
100┤•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••│
148-
90┤ │
149-
80┤ │
150-
70┤ │
151-
60┤ │
152-
50┤ │
153-
40┤ │
154-
30┤ │
155-
20┤ │
156-
10┤ │
157-
0┤ │
158-
└┬---┬---┬---─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬┘
159-
1 4 7 12 16 20 24 29 33 37 41 46 50 54 59 63 67 71 76 80
160-
[y] entropy % [x] mB
161-
pid=19803
162-
2022-07-30 07:58.16 [info ] Extracting unknown chunk chunk=0xc96196-0x1696196 path=unknown-file_extract/13197718-23683478.unknown pid=19803
163-
2022-07-30 07:58.16 [debug ] Carving chunk path=unknown-file_extract/13197718-23683478.unknown pid=19803
164-
2022-07-30 07:58.16 [debug ] Calculating entropy for file path=unknown-file_extract/13197718-23683478.unknown pid=19803 size=0xa00000
165-
2022-07-30 07:58.16 [debug ] Entropy calculated highest=99.99 lowest=99.98 mean=99.98 pid=19803
166-
2022-07-30 07:58.16 [warning ] Drawing plot pid=19803
167-
2022-07-30 07:58.16 [debug ] Entropy chart chart=
168-
Entropy distribution
169-
┌---------------------------------------------------------------------------┐
170-
100┤•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••│
171-
90┤ │
172-
80┤ │
173-
70┤ │
174-
60┤ │
175-
50┤ │
176-
40┤ │
177-
30┤ │
178-
20┤ │
179-
10┤ │
180-
0┤ │
181-
└┬---┬---┬---─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬┘
182-
1 4 7 12 16 20 24 29 33 37 41 46 50 54 59 63 67 71 76 80
183-
[y] entropy % [x] mB
136+
2024-10-30 10:52.03 [debug ] Calculating chunk for pattern match handler=arc pid=1963719 real_offset=0x1685f5b start_offset=0x1685f5b
137+
2024-10-30 10:52.03 [debug ] Header parsed header=<arc_head archive_marker=0x1a, header_type=0x1, name=b'8\xa7i&po\xc77\xd5h\x9a\x9d\xf1', size=0x26d171fa, date=0x1bfd, time=0xe03f, crc=-0x3b95, length=0x349997d5> pid=1963719
138+
2024-10-30 10:52.03 [debug ] Ended searching for chunks all_chunks=[0xa00000-0xc96196] pid=1963719
139+
2024-10-30 10:52.03 [debug ] Removed inner chunks outer_chunk_count=1 pid=1963719 removed_inner_chunk_count=0
140+
2024-10-30 10:52.03 [warning ] Found unknown Chunks chunks=[0x0-0xa00000, 0xc96196-0x1696196] pid=1963719
141+
2024-10-30 10:52.03 [info ] Extracting unknown chunk chunk=0x0-0xa00000 path=unknown-file_extract/0-10485760.unknown pid=1963719
142+
2024-10-30 10:52.03 [debug ] Carving chunk path=unknown-file_extract/0-10485760.unknown pid=1963719
143+
2024-10-30 10:52.03 [debug ] Calculating randomness for file path=unknown-file_extract/0-10485760.unknown pid=1963719 size=0xa00000
144+
2024-10-30 10:52.03 [debug ] Shannon entropy calculated block_size=0x20000 highest=99.99 lowest=99.98 mean=99.98 path=unknown-file_extract/0-10485760.unknown pid=1963719 size=0xa00000
145+
2024-10-30 10:52.03 [debug ] Chi square probability calculated block_size=0x20000 highest=97.88 lowest=3.17 mean=52.76 path=unknown-file_extract/0-10485760.unknown pid=1963719 size=0xa00000
146+
2024-10-30 10:52.03 [debug ] Entropy chart chart=
147+
Randomness distribution
148+
┌───────────────────────────────────────────────────────────────────────────┐
149+
100┤ •• Shannon entropy (%) •••••••••♰••••••••••••••••••••••••••••••••••│
150+
90┤ ♰♰ Chi square probability (%) ♰ ♰ ♰♰♰♰ ♰ ♰ ♰ │
151+
80┤♰ ♰ ♰♰ ♰♰ ♰♰ ♰ ♰ ♰♰♰♰♰♰♰♰♰ ♰ ♰♰♰♰♰♰ ♰♰ ♰♰ │
152+
70┤♰♰♰♰ ♰ ♰ ♰ ♰ ♰♰♰ ♰ ♰ ♰ ♰ ♰♰♰♰♰♰♰♰♰ ♰♰ ♰ ♰ ♰ ♰♰♰ ♰♰♰♰♰♰ │
153+
60┤♰♰♰♰ ♰♰ ♰♰ ♰ ♰♰♰♰ ♰ ♰♰ ♰ ♰ ♰ ♰♰♰♰♰♰ ♰♰ ♰ ♰ ♰♰♰♰ ♰ ♰♰♰ ♰♰♰♰♰♰♰ │
154+
50┤ ♰♰♰ ♰♰ ♰♰ ♰♰ ♰♰♰♰ ♰♰ ♰ ♰♰♰ ♰♰♰♰♰♰ ♰ ♰ ♰ ♰♰♰♰♰ ♰ ♰♰♰ ♰ ♰♰♰♰♰ ♰ │
155+
40┤ ♰♰ ♰♰ ♰ ♰♰ ♰♰♰♰ ♰♰ ♰ ♰♰♰ ♰♰♰♰♰♰ ♰♰ ♰♰ ♰♰♰♰♰♰ ♰ ♰♰♰ ♰ ♰♰♰♰ ♰♰ ♰│
156+
30┤ ♰ ♰♰ ♰♰ ♰♰♰♰ ♰ ♰♰ ♰♰ ♰♰ ♰ ♰♰ ♰ ♰ ♰♰♰ ♰ ♰ ♰♰ ♰ ♰♰♰ ♰♰ ♰ │
157+
20┤ ♰♰ ♰♰ ♰♰♰ ♰ ♰♰ ♰ ♰♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰♰ │
158+
10┤ ♰ ♰ ♰ ♰ ♰ ♰♰ ♰ ♰ ♰♰ │
159+
0┤ ♰ ♰ │
160+
└─┬──┬─┬──┬────┬───┬──┬──┬──┬───┬───┬──┬────┬───┬────┬──┬──┬────┬──┬───┬──┬─┘
161+
0 2 5 7 11 16 20 23 27 30 34 38 42 47 51 56 60 63 68 71 76 79
162+
131072 bytes
163+
path=unknown-file_extract/0-10485760.unknown pid=1963719
164+
2024-10-30 10:52.03 [info ] Extracting unknown chunk chunk=0xc96196-0x1696196 path=unknown-file_extract/13197718-23683478.unknown pid=1963719
165+
2024-10-30 10:52.03 [debug ] Carving chunk path=unknown-file_extract/13197718-23683478.unknown pid=1963719
166+
2024-10-30 10:52.03 [debug ] Calculating randomness for file path=unknown-file_extract/13197718-23683478.unknown pid=1963719 size=0xa00000
167+
2024-10-30 10:52.03 [debug ] Shannon entropy calculated block_size=0x20000 highest=99.99 lowest=99.98 mean=99.98 path=unknown-file_extract/13197718-23683478.unknown pid=1963719 size=0xa00000
168+
2024-10-30 10:52.03 [debug ] Chi square probability calculated block_size=0x20000 highest=99.03 lowest=0.23 mean=42.62 path=unknown-file_extract/13197718-23683478.unknown pid=1963719 size=0xa00000
169+
2024-10-30 10:52.03 [debug ] Entropy chart chart=
170+
Randomness distribution
171+
┌───────────────────────────────────────────────────────────────────────────┐
172+
100┤ •• Shannon entropy (%) •••••••••••••••••••••♰••••••••••••••••••••••│
173+
90┤ ♰♰ Chi square probability (%) ♰ ♰♰ ♰ │
174+
80┤♰♰ ♰♰ ♰♰ ♰ ♰♰ ♰ ♰♰ ♰ ♰♰ │
175+
70┤♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰♰ ♰♰ ♰♰♰ ♰ ♰♰ ♰♰ │
176+
60┤ ♰ ♰♰ ♰ ♰ ♰ ♰ ♰♰♰♰♰ ♰♰ ♰♰ ♰♰ ♰ ♰ ♰♰♰ ♰♰ ♰ ♰ ♰♰ ♰ │
177+
50┤ ♰ ♰♰♰ ♰ ♰ ♰ ♰ ♰ ♰♰♰♰ ♰ ♰♰ ♰ ♰♰♰ ♰ ♰ ♰ ♰♰♰ ♰♰ ♰ ♰ ♰♰ ♰♰ ♰ │
178+
40┤ ♰♰♰♰ ♰♰ ♰♰ ♰ ♰ ♰♰ ♰♰♰ ♰♰♰ ♰♰♰ ♰♰ ♰ ♰ ♰ ♰♰ ♰ ♰♰ ♰ ♰ ♰ ♰ ♰♰♰ ♰♰ │
179+
30┤ ♰♰♰♰ ♰♰ ♰♰ ♰♰ ♰♰ ♰♰ ♰♰♰♰♰ ♰♰ ♰ ♰ ♰ ♰♰ ♰♰♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰│
180+
20┤ ♰♰♰ ♰ ♰ ♰♰ ♰♰ ♰♰♰♰ ♰♰ ♰ ♰ ♰ ♰♰ ♰♰ ♰ ♰♰ ♰♰ ♰ ♰ │
181+
10┤ ♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰♰ ♰ ♰♰ ♰♰ ♰♰ ♰ ♰ ♰ │
182+
0┤ ♰ ♰ ♰♰ ♰ ♰♰ │
183+
└─┬──┬─┬──┬────┬───┬──┬──┬──┬───┬───┬──┬────┬───┬────┬──┬──┬────┬──┬───┬──┬─┘
184+
0 2 5 7 11 16 20 23 27 30 34 38 42 47 51 56 60 63 68 71 76 79
185+
131072 bytes
184186
```
185187

186188
### Skip extraction with file magic

flake.lock

Lines changed: 15 additions & 15 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

fuzzing/search_chunks_fuzzer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ def test_search_chunks(data):
4040
config = ExtractionConfig(
4141
extract_root=Path("/dev/shm"), # noqa: S108
4242
force_extract=True,
43-
entropy_depth=0,
44-
entropy_plot=False,
43+
randomness_depth=0,
44+
randomness_plot=False,
4545
skip_magic=[],
4646
skip_extension=[],
4747
skip_extraction=False,

0 commit comments

Comments
 (0)