Artifact Evaluation

This repository contains several experiments that we conducted to evaluate the artifact of MemLock (paper).

Artifact

The artifact provided by the MemLock authors only contains a subset of the experiments performed as part of the paper. Details and concerns regarding the artifacts are summarized in the following:

MemLock has been evaluated against six fuzzers, while the artifact supports only a single one (and MemLock itself)
The paper states that MemLock is "implemented based on the AFL-2.52b framework", but the source code provided as part of the artifact is forked from PerfFuzz.
The target configuration's are not uniform:
- Some targets (flex) set artificial stack size limits.
- The set ASAN_OPTIONS are target dependent, i.e., some targets have allocator_may_return_null enabled, while some have not. The same is true for the detect_leaks flag.

Conducted Experiments

As part of a larger effort to reproduce and replicate fuzzing research, we have selected MemLock for reproduction. In the following, we outline the experiments we conducted to evaluate MemLock and the claims made in the paper. We intend to only partially evaluate some aspects or reproduce all experiments presented in the paper. Instead, we thoroughly studied the artifact and paper to compile experiments suitable to test for methodological flaws.

01-Artificial-Runtime-Environment

This experiment is specific to the target flex that has been evaluated as part of the paper's evaluation to answer RQ1:

"How capable is MemLock in memory consumption crash detection?"

According to Table 1 of the paper, MemLock was the only fuzzer to find crashes in flex:

When studying the paper's artifact, we noticed that the configurations (AFL, MemLock) provided in the artifact contain the following line:

ulimit -s 2048

This flag sets the maximum stack size (see ulimit's man page) to 2048 KiB, which is 25% of the default size of 8192 KiB. Intuitively, an input is more likely to trigger a stack overflow (one of MemLock's advertised strengths) when the stack is smaller. Recursively calling functions is thus more likely to cause resource exhaustion than during regular operation.

Due to this artificial limit, we select flex as an experiment for reproducing MemLock's results and run it both with and without the manually lowered stack size.

Results

We briefly summarize the results of our experiment; to reproduce them, please refer to 01-Artificial-Runtime-Environment.

NOTE: The number of unique crashes depends on the instrumentation of the target, thus the numbers below should be interpreted as crash was found or no crash was found without considering the magnitude of the numbers themselves.

AFL with ulimit (results of ten independent runs):

/data/01-Artificial-Runtime-Environment/results# cat flex/out_AFL-afl-ulimit-*/fuzzer_stats  | grep crashes
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 2
unique_crashes    : 1
unique_crashes    : 1
unique_crashes    : 0
unique_crashes    : 1
unique_crashes    : 0
unique_crashes    : 0

AFL without ulimit (results of ten independent runs):

/data/01-Artificial-Runtime-Environment/results# cat flex/out_AFL-afl-noulimit-*/fuzzer_stats  | grep crashes
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0

MemLock with ulimit (results of ten independent runs):

/data/01-Artificial-Runtime-Environment/results# cat flex/out_Mem*-ulimit-*/fuzzer_stats  | grep crashes
unique_crashes    : 53
unique_crashes    : 4
unique_crashes    : 39
unique_crashes    : 7
unique_crashes    : 10
unique_crashes    : 19
unique_crashes    : 10
unique_crashes    : 17
unique_crashes    : 28
unique_crashes    : 12

MemLock without ulimit (results of ten independent runs):

/data/01-Artificial-Runtime-Environment/results# cat flex/out_Mem*-noulimit-*/fuzzer_stats  | grep crashes
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0
unique_crashes    : 0

In summary, (1) with manually lowered stack size, both tools find crashes (even though MemLock finds more) but (2) without a manually lowered stack size, no tool finds a single crash.

We believe a fair evaluation should not set a lower stack size, as this behavior

diverges from the rest of the evaluation (only applied to a single target),
was not documented in the paper (but required studying the source code), and
benefitted MemLock without any real-world scenario backing this (when talking about memory corruption vulnerabilities, an argument can be made that they should not occur even when having an altered environment; however, for resource exhaustion bugs, constraining the resources will naturally lead to such an outcome. We believe this does not represent a security vulnerability in itself)

Additionally, comparing MemLock against AFL does not represent a fair comparison, as MemLock's source code is based on PerfFuzz (notwithstanding the paper claims its based on AFL, yet the published version is clearly not). This makes it difficult to judge whether the increased number of unique crashes (in itself a questionable metric, see below) can be attributed to MemLock or rather its baseline, PerfFuzz.

02-Unique-Crashes

MemLock makes heavy use of unique crashes as a metric during its evaluation and uses it as an indicator whether it outperformed other fuzzers as seen in Table 1:

Generally, the fuzzing community has doubted the efficacy of this metric, as many of these crashes often point to a single bug, despite the name indicating a sort of uniqueness. To experimentally test whether this approach is suitable to compare different fuzzers, we designed an experiment were we manually deduplicate unique crashes found by MemLock and AFL in order to determine how many crashes are related to a single cause (for implementation details, see 02-Unique-Crashes).

For this experiment, we select the targets readelf, cxxfilt and nm. Each target was fuzzed 10 times for 24h by each fuzzer.

Raw result

During the fuzzing runs conducted for the chosen targets, the following number of unique crashes have been found across all 10 runs (union of all runs):

Fuzzer	Target	#Crashes
MemLock	readelf	1100
MemLock	cxxfilt	5321
MemLock	nm	1717
AFL	readelf	3311
AFL	cxxfilt	3684
AFL	nm	464

Deduplicating crashes using available patch

To identify the true number of bugs, we manually deduplicate all unique crashes as follows: We first replay all crashing inputs on a patched version of the respective target. The used patch was made available by the binutils maintainers in response to CVE-2018-18484 that has been reported by the MemLock authors for cxxfilt.

The underlying idea is that all crashing inputs that no longer crash have been addressed by the bug fix, thus mapping to the single bug. If unique crashes are indeed a good proxy metric for actual bugs, we would see most crashing inputs to still crash the target. Our obtained results were as follows:

Fuzzer	Target	#Unique Crashes	#Crashes on Patched target
MemLock	readelf	1100	1100
MemLock	cxxfilt	5321	6
MemLock	nm	1717	13
AFL	readelf	3311	3311
AFL	cxxfilt	3684	0
AFL	nm	464	8

From these results, two interesting conclusion can be drawn:

A considerable amount of the presumably unique crashes for nm and cxxfilt can be attributed to the same bug. This indicates that unique crashes are no reliable indicator for the number of actual bugs in a target.
The reason we see the decreased number of inputs not only for cxxfilt but also nm is that the applied patch targets the library libiberty's (part of binutils) demangling code which both targets are linked against. This shows that special care must be taken if targets sharing the same code base are fuzzed.

Deduplicating via stack hashing

To analyze the remaining crashes, we opt for a (manual) analysis of stack backtraces.

`readelf`

For the readelf target 1100 and 3311 unique crashes for MemLock and AFL, respectively, remain. It is noteworthy that most crashes (100% for MemLock and 98.5% for AFL) are only triggered because the ASAN option allocator_may_return_null=0 was set for readelf (contrary to most other targets, were this option was not set).

Manually inspecting the backtraces of MemLock's crashes quickly reveals that all 1100 crashes are caused by the same cmalloc statement that is supplied with a user-controlled variable. To reproduce this, run:

# The same function call is the root for all crashes (the frames above frame #8 belong to the allocator and ASAN)
/data/02-Unique-Crashes/results/filtered_crashes/readelf# cat MemLock/* | grep "#8 0x5735b6 in get_program_headers /workdir/MemLock/evaluation/BUILD/readelf_b9913fd2/SRC_MemLock/binutils/readelf.c:4761:33" | wc -l
1100

Similarly, when doing the same for AFL's crashes, we find that 3260 out of 3311 crashes are attributed to the same user-controlled cmalloc statement:

# The same function call is the root for all crashes (the frames above frame #8 belong to the allocator and ASAN)
/data/02-Unique-Crashes/results/filtered_crashes/readelf# cat AFL/* | grep "#8 0x571cb6 in get_program_headers /workdir/MemLock/evaluation/BUILD/readelf_b9913fd2/SRC_AFL/binutils/readelf.c:4761:33" | wc -l
3260

The remaining 51 crashes of AFL belong to a heap buffer overflow. Effectively, this means that all unique crashes of MemLock map to a single bug, while AFL's crashes point to two different bugs.

`cxxfilt`

In the case of AFL, all 3684 cxxfilt crashes were resolved by applying the patch as mentioned earlier. The six remaining crashes in case of MemLock belong to the same bug that arises around the demangle_expression function, which is not protected by the stack depth counter introduced by the patch. Arguably, MemLock found a new bug, meaning the patch is incomplete.

`nm`

For nm, 13 and 8 crashes remain for MemLock and AFL, respectively. For both, all crashes belong to the same bug as triggered in cxxfilt. This is, again, caused by the fact that both targets linked against the same library, libiberty, which contains the affected code. Interestingly, the bug related to demangle_expression found by MemLock in cxxfilt was found by AFL in nm.

Conclusion

Considering the initial number of unique crashes, our deduplication efforts paint a different picture:

Fuzzer	Target	#Unique Crashes	#Bugs	Explanation
MemLock	readelf	1100	1	All crashes have been caused by the same `cmalloc` statement that received user-controlled input
MemLock	cxxfilt	5321	2	One bug patched by the maintainers and another one from the 6 remaining crashes (`demangle_expression`)
MemLock	nm	1717	0	All crashes were also triggered in `cxxfilt`, since both targets depend on `libiberty`
AFL	readelf	3311	2	In addition to the user-controlled `cmalloc` statement, AFL was also able to trigger a heap overflow
AFL	cxxfilt	3684	1	AFL found only the bug that was patched by the fix provided by the binutils authors
AFL	nm	464	1	While AFL did not found the bug related to the `demangle_expression` in `cxxfilt`, it was able to find it in `nm`

Overall, AFL found 4 bugs, while MemLock found 3. The number of bugs is a completely different order of magnitude than the number of unique crashes. From this, we can draw the following conclusion:

Unique crashes are not a good proxy metric for actual bugs found.
Special attention needs to be paid when fuzzing targets that share code (as is the case for binutils, one of the most popular fuzzing targets).

03-Reported-CVEs

MemLock found several vulnerabilities and, overall, received 26 CVEs, as listed in the artifact repository (and repeated below for your convenience).

#	Vulnerability	Package	Program	Vulnerability Type
1	CVE-2020-36375	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
2	CVE-2020-36374	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
3	CVE-2020-36373	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
4	CVE-2020-36372	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
5	CVE-2020-36371	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
6	CVE-2020-36370	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
7	CVE-2020-36369	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
8	CVE-2020-36368	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
9	CVE-2020-36367	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
10	CVE-2020-36366	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
11	CVE-2020-18392	MJS 1.20.1	mjs	CWE-674: Uncontrolled Recursion
12	CVE-2019-6293	Flex 2.6.4	flex	CWE-674: Uncontrolled Recursion
13	CVE-2019-6292	Yaml-cpp v0.6.2	prase	CWE-674: Uncontrolled Recursion
14	CVE-2019-6291	NASM 2.14.03rc1	nasm	CWE-674: Uncontrolled Recursion
15	CVE-2019-6290	NASM 2.14.03rc1	nasm	CWE-674: Uncontrolled Recursion
16	CVE-2018-18701	Binutils 2.31	nm	CWE-674: Uncontrolled Recursion
17	CVE-2018-18700	Binutils 2.31	nm	CWE-674: Uncontrolled Recursion
18	CVE-2018-18484	Binutils 2.31	c++filt	CWE-674: Uncontrolled Recursion
19	CVE-2018-17985	Binutils 2.31	c++filt	CWE-674: Uncontrolled Recursion
20	CVE-2019-7704	Binaryen 1.38.22	wasm-opt	CWE-789: Uncontrolled Memory Allocation
21	CVE-2019-7698	Bento4 v1.5.1-627	mp4dump	CWE-789: Uncontrolled Memory Allocation
22	CVE-2019-7148	Elfutils 0.175	eu-ar	CWE-789: Uncontrolled Memory Allocation
23	CVE-2018-20652	Tinyexr v0.9.5	tinyexr	CWE-789: Uncontrolled Memory Allocation
24	CVE-2018-18483	Binutils 2.31	c++filt	CWE-789: Uncontrolled Memory Allocation
25	CVE-2018-20657	Binutils 2.31	c++filt	CWE-401: Memory Leak
26	CVE-2018-20002	Binutils 2.31	nm	CWE-401: Memory Leak

To better understand the real-world impact of MemLock, we have looked into these CVEs.

CVEs reported for mJS (#1 to #11)

The mJS software is described as follows in its Github repository:

mJS is designed for microcontrollers with limited resources. Main design goals are: small footprint and simple C/C++ interoperability. mJS implements a strict subset of ES6 (JavaScript version 6)

mJS is an interpreter that effectively parses Javascript to execute it. Source code is typically processed as a tree and parsed top-down. For example, if a logical addition (+) is encountered, the parser processes either side (operand) until it reaches the bottom of the tree. This top-down parsing process naturally embodies the state that needs to be carried over to the next depth.

Since an attacker supplying Javascript code can nest these tree structures arbitrarily deep, they can exploit this parsing process to exhaust the available stack memory (typically, 8 MiB for Linux-based systems).

According to the CVE descriptions, all bugs reported by the authors of MemLock are related to stack overflows (i.e., the stack's size limit is reached) in some parsing functions of the form parse_*.

Studying the CVEs, we noticed that CVE-2020-36375, CVE-2020-36374, CVE-2020-36373, CVE-2020-36372, CVE-2020-36371, and CVE-2020-36370 all refer to the same bug report 136 and only differ in the name of the causing function. These function names have been picked from the stack trace that led to the resource exhaustion:

    #283 0x599c92 in parse_assignment /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12532:3
    #284 0x5acfb4 in parse_expr /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12536:10
    #285 0x5acfb4 in parse_array_literal /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12294
    #286 0x5a7a58 in parse_literal /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12354:13
    #287 0x5a7a58 in parse_call_dot_mem /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12380
    #288 0x5a6400 in parse_postfix /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12414:14
    #289 0x5a6400 in parse_unary /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12433
    #290 0x5a5a6e in parse_mul_div_rem /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12446:3
    #291 0x5a5236 in parse_plus_minus /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12451:3
    #292 0x5a4b00 in parse_shifts /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12456:3
    #293 0x5a441e in parse_comparison /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12460:3
    #294 0x5a3c4f in parse_equality /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12464:3
    #295 0x5a24ab in parse_bitwise_and /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12469:3
    #296 0x5a0bec in parse_bitwise_xor /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12474:3
    #297 0x59f1ab in parse_bitwise_or /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12479:3
    #298 0x59d944 in parse_logical_and /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12484:3

Since the issue is bound to one specific function and caused by too many stacked stack frames, it does not make sense to report a specific function being the reason for the crash.

The same problem can be observed by looking at the other CVEs assigned for mJS. The CVEs CVE-2020-36369, CVE-2020-36368 and CVE-2020-36367 all reference the same bug report 135. Similarly, we observe that CVE-2020-36366 and CVE-2020-18392 both belong to bug report 106.

In summary, from eleven CVEs assigned for bugs found in mjs, only three belong to actual bug reports, with the other eight picking different function names from the respective back traces for the CVE description.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
01-Artificial-Runtime-Environment		01-Artificial-Runtime-Environment
02-Unique-Crashes		02-Unique-Crashes
Memlock-Fuzz-upstream @ a70c56c		Memlock-Fuzz-upstream @ a70c56c
assets		assets
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
config.sh		config.sh
prepare.sh		prepare.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Artifact Evaluation

Artifact

Conducted Experiments

01-Artificial-Runtime-Environment

Results

02-Unique-Crashes

Raw result

Deduplicating crashes using available patch

Deduplicating via stack hashing

`readelf`

`cxxfilt`

`nm`

Conclusion

03-Reported-CVEs

CVEs reported for mJS (#1 to #11)

About

Releases

Packages

Languages

License

fuzz-evaluator/MemLock-Fuzz-eval

Folders and files

Latest commit

History

Repository files navigation

Artifact Evaluation

Artifact

Conducted Experiments

01-Artificial-Runtime-Environment

Results

02-Unique-Crashes

Raw result

Deduplicating crashes using available patch

Deduplicating via stack hashing

readelf

cxxfilt

nm

Conclusion

03-Reported-CVEs

CVEs reported for mJS (#1 to #11)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`readelf`

`cxxfilt`

`nm`

Packages