This repository contains several experiments that we conducted to evaluate the artifact of MemLock (paper).
The artifact provided by the MemLock authors only contains a subset of the experiments performed as part of the paper. Details and concerns regarding the artifacts are summarized in the following:
- MemLock has been evaluated against six fuzzers, while the artifact supports only a single one (and MemLock itself)
- The paper states that MemLock is "implemented based on the AFL-2.52b framework", but the source code provided as part of the artifact is forked from PerfFuzz.
- The target configuration's are not uniform:
- Some targets (
flex
) set artificial stack size limits. - The set
ASAN_OPTIONS
are target dependent, i.e., some targets haveallocator_may_return_null
enabled, while some have not. The same is true for thedetect_leaks
flag.
- Some targets (
As part of a larger effort to reproduce and replicate fuzzing research, we have selected MemLock for reproduction. In the following, we outline the experiments we conducted to evaluate MemLock and the claims made in the paper. We intend to only partially evaluate some aspects or reproduce all experiments presented in the paper. Instead, we thoroughly studied the artifact and paper to compile experiments suitable to test for methodological flaws.
This experiment is specific to the target flex
that has been evaluated as part of the paper's evaluation to answer RQ1:
"How capable is MemLock in memory consumption crash detection?"
According to Table 1 of the paper, MemLock was the only fuzzer to find crashes in flex
:
When studying the paper's artifact, we noticed that the configurations (AFL, MemLock) provided in the artifact contain the following line:
ulimit -s 2048
This flag sets the maximum stack size (see ulimit
's man page) to 2048 KiB, which is 25% of the default size of 8192 KiB.
Intuitively, an input is more likely to trigger a stack overflow (one of MemLock's advertised strengths) when the stack is smaller. Recursively calling functions is thus more likely to cause resource exhaustion than during regular operation.
Due to this artificial limit, we select flex
as an experiment for reproducing MemLock's results and run it both with and without the manually lowered stack size.
We briefly summarize the results of our experiment; to reproduce them, please refer to 01-Artificial-Runtime-Environment.
NOTE: The number of unique crashes depends on the instrumentation of the target, thus the numbers below should be interpreted as crash was found or no crash was found without considering the magnitude of the numbers themselves.
AFL with ulimit
(results of ten independent runs):
/data/01-Artificial-Runtime-Environment/results# cat flex/out_AFL-afl-ulimit-*/fuzzer_stats | grep crashes
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 2
unique_crashes : 1
unique_crashes : 1
unique_crashes : 0
unique_crashes : 1
unique_crashes : 0
unique_crashes : 0
AFL without ulimit
(results of ten independent runs):
/data/01-Artificial-Runtime-Environment/results# cat flex/out_AFL-afl-noulimit-*/fuzzer_stats | grep crashes
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
MemLock with ulimit
(results of ten independent runs):
/data/01-Artificial-Runtime-Environment/results# cat flex/out_Mem*-ulimit-*/fuzzer_stats | grep crashes
unique_crashes : 53
unique_crashes : 4
unique_crashes : 39
unique_crashes : 7
unique_crashes : 10
unique_crashes : 19
unique_crashes : 10
unique_crashes : 17
unique_crashes : 28
unique_crashes : 12
MemLock without ulimit
(results of ten independent runs):
/data/01-Artificial-Runtime-Environment/results# cat flex/out_Mem*-noulimit-*/fuzzer_stats | grep crashes
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
unique_crashes : 0
In summary, (1) with manually lowered stack size, both tools find crashes (even though MemLock finds more) but (2) without a manually lowered stack size, no tool finds a single crash.
We believe a fair evaluation should not set a lower stack size, as this behavior
- diverges from the rest of the evaluation (only applied to a single target),
- was not documented in the paper (but required studying the source code), and
- benefitted MemLock without any real-world scenario backing this (when talking about memory corruption vulnerabilities, an argument can be made that they should not occur even when having an altered environment; however, for resource exhaustion bugs, constraining the resources will naturally lead to such an outcome. We believe this does not represent a security vulnerability in itself)
Additionally, comparing MemLock against AFL does not represent a fair comparison, as MemLock's source code is based on PerfFuzz (notwithstanding the paper claims its based on AFL, yet the published version is clearly not). This makes it difficult to judge whether the increased number of unique crashes (in itself a questionable metric, see below) can be attributed to MemLock or rather its baseline, PerfFuzz.
MemLock makes heavy use of unique crashes as a metric during its evaluation and uses it as an indicator whether it outperformed other fuzzers as seen in Table 1:
Generally, the fuzzing community has doubted the efficacy of this metric, as many of these crashes often point to a single bug, despite the name indicating a sort of uniqueness. To experimentally test whether this approach is suitable to compare different fuzzers, we designed an experiment were we manually deduplicate unique crashes found by MemLock and AFL in order to determine how many crashes are related to a single cause (for implementation details, see 02-Unique-Crashes).
For this experiment, we select the targets readelf
, cxxfilt
and nm
. Each target was fuzzed 10 times for 24h by each fuzzer.
During the fuzzing runs conducted for the chosen targets, the following number of unique crashes have been found across all 10 runs (union of all runs):
Fuzzer | Target | #Crashes |
---|---|---|
MemLock | readelf | 1100 |
MemLock | cxxfilt | 5321 |
MemLock | nm | 1717 |
AFL | readelf | 3311 |
AFL | cxxfilt | 3684 |
AFL | nm | 464 |
To identify the true number of bugs, we manually deduplicate all unique crashes as follows: We first replay all crashing inputs on a patched version of the respective target. The used patch was made available by the
binutils maintainers in response to CVE-2018-18484 that has been reported by the MemLock authors for cxxfilt
.
The underlying idea is that all crashing inputs that no longer crash have been addressed by the bug fix, thus mapping to the single bug. If unique crashes are indeed a good proxy metric for actual bugs, we would see most crashing inputs to still crash the target. Our obtained results were as follows:
Fuzzer | Target | #Unique Crashes | #Crashes on Patched target |
---|---|---|---|
MemLock | readelf | 1100 | 1100 |
MemLock | cxxfilt | 5321 | 6 |
MemLock | nm | 1717 | 13 |
AFL | readelf | 3311 | 3311 |
AFL | cxxfilt | 3684 | 0 |
AFL | nm | 464 | 8 |
From these results, two interesting conclusion can be drawn:
- A considerable amount of the presumably unique crashes for
nm
andcxxfilt
can be attributed to the same bug. This indicates that unique crashes are no reliable indicator for the number of actual bugs in a target. - The reason we see the decreased number of inputs not only for
cxxfilt
but alsonm
is that the applied patch targets the librarylibiberty
's (part of binutils) demangling code which both targets are linked against. This shows that special care must be taken if targets sharing the same code base are fuzzed.
To analyze the remaining crashes, we opt for a (manual) analysis of stack backtraces.
For the readelf
target 1100 and 3311 unique crashes for MemLock and AFL, respectively, remain. It is noteworthy that most crashes (100% for MemLock and 98.5% for AFL) are only triggered because the ASAN option allocator_may_return_null=0
was set for readelf
(contrary to most other targets, were this option was not set).
Manually inspecting the backtraces of MemLock's crashes quickly reveals that all 1100 crashes are caused by the same cmalloc
statement that is supplied with a user-controlled variable. To reproduce this, run:
# The same function call is the root for all crashes (the frames above frame #8 belong to the allocator and ASAN)
/data/02-Unique-Crashes/results/filtered_crashes/readelf# cat MemLock/* | grep "#8 0x5735b6 in get_program_headers /workdir/MemLock/evaluation/BUILD/readelf_b9913fd2/SRC_MemLock/binutils/readelf.c:4761:33" | wc -l
1100
Similarly, when doing the same for AFL's crashes, we find that 3260 out of 3311 crashes are attributed to the same user-controlled cmalloc
statement:
# The same function call is the root for all crashes (the frames above frame #8 belong to the allocator and ASAN)
/data/02-Unique-Crashes/results/filtered_crashes/readelf# cat AFL/* | grep "#8 0x571cb6 in get_program_headers /workdir/MemLock/evaluation/BUILD/readelf_b9913fd2/SRC_AFL/binutils/readelf.c:4761:33" | wc -l
3260
The remaining 51 crashes of AFL belong to a heap buffer overflow. Effectively, this means that all unique crashes of MemLock map to a single bug, while AFL's crashes point to two different bugs.
In the case of AFL, all 3684 cxxfilt
crashes were resolved by applying the patch as mentioned earlier. The six remaining crashes in case of MemLock belong to the same bug that arises around the demangle_expression
function, which is not protected by the stack depth counter introduced by the patch. Arguably, MemLock found a new bug, meaning the patch is incomplete.
For nm
, 13 and 8 crashes remain for MemLock and AFL, respectively. For both, all crashes belong to the same bug as triggered in cxxfilt
. This is, again, caused by the fact that both targets linked against the same library, libiberty
, which contains the affected code. Interestingly, the bug related to demangle_expression
found by MemLock in cxxfilt
was found by AFL in nm
.
Considering the initial number of unique crashes, our deduplication efforts paint a different picture:
Fuzzer | Target | #Unique Crashes | #Bugs | Explanation |
---|---|---|---|---|
MemLock | readelf | 1100 | 1 | All crashes have been caused by the same cmalloc statement that received user-controlled input |
MemLock | cxxfilt | 5321 | 2 | One bug patched by the maintainers and another one from the 6 remaining crashes (demangle_expression ) |
MemLock | nm | 1717 | 0 | All crashes were also triggered in cxxfilt , since both targets depend on libiberty |
AFL | readelf | 3311 | 2 | In addition to the user-controlled cmalloc statement, AFL was also able to trigger a heap overflow |
AFL | cxxfilt | 3684 | 1 | AFL found only the bug that was patched by the fix provided by the binutils authors |
AFL | nm | 464 | 1 | While AFL did not found the bug related to the demangle_expression in cxxfilt , it was able to find it in nm |
Overall, AFL found 4 bugs, while MemLock found 3. The number of bugs is a completely different order of magnitude than the number of unique crashes. From this, we can draw the following conclusion:
- Unique crashes are not a good proxy metric for actual bugs found.
- Special attention needs to be paid when fuzzing targets that share code (as is the case for binutils, one of the most popular fuzzing targets).
MemLock found several vulnerabilities and, overall, received 26 CVEs, as listed in the artifact repository (and repeated below for your convenience).
# | Vulnerability | Package | Program | Vulnerability Type |
---|---|---|---|---|
1 | CVE-2020-36375 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
2 | CVE-2020-36374 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
3 | CVE-2020-36373 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
4 | CVE-2020-36372 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
5 | CVE-2020-36371 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
6 | CVE-2020-36370 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
7 | CVE-2020-36369 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
8 | CVE-2020-36368 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
9 | CVE-2020-36367 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
10 | CVE-2020-36366 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
11 | CVE-2020-18392 | MJS 1.20.1 | mjs | CWE-674: Uncontrolled Recursion |
12 | CVE-2019-6293 | Flex 2.6.4 | flex | CWE-674: Uncontrolled Recursion |
13 | CVE-2019-6292 | Yaml-cpp v0.6.2 | prase | CWE-674: Uncontrolled Recursion |
14 | CVE-2019-6291 | NASM 2.14.03rc1 | nasm | CWE-674: Uncontrolled Recursion |
15 | CVE-2019-6290 | NASM 2.14.03rc1 | nasm | CWE-674: Uncontrolled Recursion |
16 | CVE-2018-18701 | Binutils 2.31 | nm | CWE-674: Uncontrolled Recursion |
17 | CVE-2018-18700 | Binutils 2.31 | nm | CWE-674: Uncontrolled Recursion |
18 | CVE-2018-18484 | Binutils 2.31 | c++filt | CWE-674: Uncontrolled Recursion |
19 | CVE-2018-17985 | Binutils 2.31 | c++filt | CWE-674: Uncontrolled Recursion |
20 | CVE-2019-7704 | Binaryen 1.38.22 | wasm-opt | CWE-789: Uncontrolled Memory Allocation |
21 | CVE-2019-7698 | Bento4 v1.5.1-627 | mp4dump | CWE-789: Uncontrolled Memory Allocation |
22 | CVE-2019-7148 | Elfutils 0.175 | eu-ar | CWE-789: Uncontrolled Memory Allocation |
23 | CVE-2018-20652 | Tinyexr v0.9.5 | tinyexr | CWE-789: Uncontrolled Memory Allocation |
24 | CVE-2018-18483 | Binutils 2.31 | c++filt | CWE-789: Uncontrolled Memory Allocation |
25 | CVE-2018-20657 | Binutils 2.31 | c++filt | CWE-401: Memory Leak |
26 | CVE-2018-20002 | Binutils 2.31 | nm | CWE-401: Memory Leak |
To better understand the real-world impact of MemLock, we have looked into these CVEs.
The mJS
software is described as follows in its Github repository:
mJS is designed for microcontrollers with limited resources. Main design goals are: small footprint and simple C/C++ interoperability. mJS implements a strict subset of ES6 (JavaScript version 6)
mJS
is an interpreter that effectively parses Javascript to execute it.
Source code is typically processed as a tree and parsed top-down. For example, if a logical addition (+
) is encountered, the parser processes either side (operand) until it reaches the bottom of the tree. This top-down parsing process naturally embodies the state that needs to be carried over to the next depth.
Since an attacker supplying Javascript code can nest these tree structures arbitrarily deep, they can exploit this parsing process to exhaust the available stack memory (typically, 8 MiB for Linux-based systems).
According to the CVE descriptions, all bugs reported by the authors of MemLock are related to stack overflows (i.e., the stack's size limit is reached) in some parsing functions of the form parse_*
.
Studying the CVEs, we noticed that CVE-2020-36375, CVE-2020-36374, CVE-2020-36373, CVE-2020-36372, CVE-2020-36371, and CVE-2020-36370 all refer to the same bug report 136 and only differ in the name of the causing function. These function names have been picked from the stack trace that led to the resource exhaustion:
#283 0x599c92 in parse_assignment /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12532:3
#284 0x5acfb4 in parse_expr /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12536:10
#285 0x5acfb4 in parse_array_literal /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12294
#286 0x5a7a58 in parse_literal /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12354:13
#287 0x5a7a58 in parse_call_dot_mem /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12380
#288 0x5a6400 in parse_postfix /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12414:14
#289 0x5a6400 in parse_unary /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12433
#290 0x5a5a6e in parse_mul_div_rem /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12446:3
#291 0x5a5236 in parse_plus_minus /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12451:3
#292 0x5a4b00 in parse_shifts /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12456:3
#293 0x5a441e in parse_comparison /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12460:3
#294 0x5a3c4f in parse_equality /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12464:3
#295 0x5a24ab in parse_bitwise_and /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12469:3
#296 0x5a0bec in parse_bitwise_xor /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12474:3
#297 0x59f1ab in parse_bitwise_or /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12479:3
#298 0x59d944 in parse_logical_and /home/hjwang/UAF_Objects/mjs_afl_asan/mjs.c:12484:3
Since the issue is bound to one specific function and caused by too many stacked stack frames, it does not make sense to report a specific function being the reason for the crash.
The same problem can be observed by looking at the other CVEs assigned for mJS
. The CVEs CVE-2020-36369, CVE-2020-36368 and CVE-2020-36367 all reference the same bug report 135. Similarly, we observe that CVE-2020-36366 and CVE-2020-18392 both belong to bug report 106.
In summary, from eleven CVEs assigned for bugs found in mjs, only three belong to actual bug reports, with the other eight picking different function names from the respective back traces for the CVE description.