Add new measurer based on Mutation Analysis #1901

phi-go · 2023-09-22T13:29:17Z

We recently published a paper describing an approach to use mutation analysis for fuzzer benchmarking. This pull request aims to support this approach (Phase I only) for FuzzBench.

There are a few mismatches between FuzzBench and our framework that would need some solutions, while I have some ideas, I would be interested in your input as well!

My understanding of measurers for FuzzBench is as follows:
At the start of a run all docker images are built. Specifically, for the coverage measurer the generated.mk contains targets of the form build-coverage-{benchmark} which build the images containing executables used for coverage collection. These make commands are started in the builder.py, and after the image is built the coverage executables are extracted and stored on the host.
The coverage executables are then used by the code in the measure_manager.py to measure unmeasured_snapshots, which I would expect to be newly created fuzzer queue entries. Coverage measurement happens on the host in a temporary directory as far as I can tell, with a timeout of 900 seconds for a snapshot.

Open points:

Our framework requires a single bitcode file of intermediate same as the object file that would be passed to the fuzzer's compiler. We usually use wllvm/gllvm to get this bitcode file. Though there are other methods, this has worked quite well if the build scripts respect setting CC/CXX. Hopefully, this also works for FuzzBench.
In our framework we use coverage to decide on which mutant binaries to compile (we use Supermutants but I would ignore this step for now). However, the continuous measurements while running FuzzBench would require either compiling all mutant binaries at the beginning or on demand, once we see a mutation is covered. The first option, I would like to avoid as this will cause compilation of large amounts of mutants that might not even be used. The second option would require keeping around a builder image for each benchmark as well as some synchronization to avoid compiling multiple times. Also long compilation times can delay measurements of snapshots. Doing measurements for the mutation score at the end would also be a possibility, though it would require keeping around all snapshots.
Measuring if a mutant is detected needs to be done individually, by the nature of mutation analysis, infinite loops, high memory usage and crashes are expected to happen. To not mess with the host system it might also be sensible to execute the mutant executables in a container.

google-cla · 2023-09-22T13:29:21Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Co-authored-by: Philipp Görz <phi-go@users.noreply.github.com>

phi-go · 2023-12-23T10:08:48Z

Hey @alan32liu, the current commit should hopefully pass the CI checks, though, we are still encountering some errors for the integration tests with our setup but I believe this is just caused by the differences in the environment. We have merged the fixes for #1937, thank you for your help there. The current commit is still a work in progress regarding report generation and currently only 10 mutant binaries are built to keep testing times low.

We added the more granular timestamps as well, see changes to the extract_corpus() and archive_corpus() functions.

If the CI checks pass could you start a gcbrun? I think the following command should be correct:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus aflplusplus_407 --mutation-analysis

We also encounter two errors that we are quite confused by, maybe you have an idea?

One is part of the integration tests:
Here: experiment/test_runner.py::TestIntegrationRunner::test_integration_runner fails for the following assert:
assert len(os.listdir(output_corpus_dir)) > 5
The output_corpus_dir instead contains a corpus and crashes dir, where the corpus dir does contain more than 5 files. The fuzzer-log.txt contains the following warnings, could these explain the issue?:

[run_fuzzer] Running command: /home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest -print_final_stats=1 -close_fd_mask=3 -fork=1 -ignore_ooms=1 -ignore_timeouts=1 -ignore_crashes=1 -entropic=1 -keep_seed=1 -cross_over_uniform_dist=1 -entropi>
WARNING: unrecognized flag '-fork=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-ignore_ooms=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-ignore_timeouts=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-ignore_crashes=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-entropic=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-keep_seed=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-cross_over_uniform_dist=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-entropic_scale_per_exec_time=1'; use -help=1 to list all flags
INFO: Seed: 3421389961 INFO: Loaded 1 modules   (1645 inline 8-bit counters): 1645 [0x48ef70, 0x48f5dd),
INFO: Loaded 1 PC tables (1645 PCs): 1645 [0x472f58,0x479628),
INFO:        0 files found in /tmp/pytest-of-pgoerz/pytest-51/test_integration_runner0/corpus/corpus
INFO:        0 files found in /tmp/pytest-of-pgoerz/pytest-51/test_integration_runner0/seeds
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus #2
      INITED cov: 4 ft: 5 corp: 1/1b lim: 4 exec/s: 0 rss: 23Mb
        NEW_FUNC[1/1]: ==458257==WARNING: invalid path to external symbolizer! ==458257==
WARNING: Failed to use and restart external symbolizer! 0x450a10  (/home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest+0x450a10) #2062
   NEW    cov: 7 ft: 8 corp: 2/21b lim: 21 exec/s: 0 rss: 26Mb L: 20/20 MS: 5 InsertRepeatedBytes-CMP-InsertByte-InsertRepeatedBytes-InsertByte- DE: "\x01\x00\x00\x00\x00\x00\x00\x14"-
        NEW_FUNC[1/4]: 0x450aa0  (/home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest+0x450aa0)
        NEW_FUNC[2/4]: 0x450b30  (/home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest+0x450b30) #2240
   NEW    cov: 17 ft: 18 corp: 3/41b lim: 21 exec/s: 0 rss: 26Mb L: 20/20 MS: 3 CopyPart-CMP-CopyPart- DE: "\x01\x00"-

The other happens during report generation, which we tried to debug but are not even sure what of our changes could have even caused it:

INFO:root:experiment_df:
                                   git_hash  experiment_filestore  ... fuzzer_stats crash_key
0  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
1  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
2  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None

[3 rows x 12 columns]
WARNING:root:Filtered out invalid benchmarks: set().
                                   git_hash  experiment_filestore  ... fuzzer_stats crash_key
0  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
1  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
2  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None

[3 rows x 12 columns]
Int64Index([0, 1, 2], dtype='int64')
                                   git_hash  experiment_filestore  ... crash_key firsts
0  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...      None   True
1  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...      None   True
2  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...      None   True

[3 rows x 13 columns]
Int64Index([0, 1, 2], dtype='int64')
ERROR:root:Error generating HTML report. Extras:
    traceback: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 11003, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
  File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 324, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4807, in reindex
    return super().reindex(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/generic.py", line 4966, in reindex
    return self._reindex_axes(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4626, in _reindex_axes
    frame = frame._reindex_index(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4642, in _reindex_index
    new_index, indexer = self.index.reindex(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 4237, in reindex
    target = self._wrap_reindex_result(target, indexer, preserve_names)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 2520, in _wrap_reindex_result
    target = MultiIndex.from_tuples(target)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 204, in new_meth
    return meth(self_or_cls, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 559, in from_tuples
    arrays = list(lib.tuples_to_object_array(tuples).T)
  File "pandas/_libs/lib.pyx", line 2930, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/work/src/experiment/reporter.py", line 82, in output_report
    generate_report.generate_report(
  File "/work/src/analysis/generate_report.py", line 269, in generate_report
    experiment_df = data_utils.add_bugs_covered_column(experiment_df)
  File "/work/src/analysis/data_utils.py", line 170, in add_bugs_covered_column
    df['firsts'] = (
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3645, in __setitem__
    self._set_item_frame_value(key, value)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3787, in _set_item_frame_value
    arraylike = _reindex_for_setitem(value, self.index)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 11010, in _reindex_for_setitem
    raise TypeError(
TypeError: incompatible index of inserted column with frame index

DonggeLiu · 2023-12-23T10:18:53Z

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus aflplusplus_407 --mutation-analysis

DonggeLiu · 2023-12-23T11:10:01Z

We also encounter two errors that we are quite confused by, maybe you have an idea?

I will list my initial thoughts below, but I did not have a chance to read your code thoroughly and may make mistakes.
@jonathanmetzman please correct me if I am wrong.

One is part of the integration tests: Here: experiment/test_runner.py::TestIntegrationRunner::test_integration_runner fails for the following assert: assert len(os.listdir(output_corpus_dir)) > 5 The output_corpus_dir instead contains a corpus and crashes dir, where the corpus dir does contain more than 5 files. The fuzzer-log.txt contains the following warnings, could these explain the issue?:

It seems the corpus/ was not correctly defined by your fuzzer (BTW, would it be better to change the copyright info?).
Here is an example from libfuzzer:

fuzzbench/fuzzers/libfuzzer/fuzzer.py

Lines 49 to 52 in 2bc06d4

    
           # Seperate out corpus and crash directories as sub-directories of 
        
           # |output_corpus| to avoid conflicts when corpus directory is reloaded. 
        
           crashes_dir = os.path.join(output_corpus, 'crashes') 
        
           output_corpus = os.path.join(output_corpus, 'corpus')

The other happens during report generation, which we tried to debug but are not even sure what of our changes could have even caused it:

Traceback (most recent call last):
File "/work/src/experiment/reporter.py", line 82, in output_report
generate_report.generate_report(
File "/work/src/analysis/generate_report.py", line 269, in generate_report
experiment_df = data_utils.add_bugs_covered_column(experiment_df)
File "/work/src/analysis/data_utils.py", line 170, in add_bugs_covered_column
df['firsts'] = (
File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3645, in setitem
self._set_item_frame_value(key, value)
File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3787, in _set_item_frame_value
arraylike = _reindex_for_setitem(value, self.index)
File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 11010, in _reindex_for_setitem
raise TypeError(
TypeError: incompatible index of inserted column with frame index

The corresponding lines were finding the first bug covered by each fuzzer on each benchmark, and adding it as a new column in the original data frame:

fuzzbench/analysis/data_utils.py

Lines 162 to 164 in 2bc06d4

    
           df['firsts'] = ( 
        
               df.groupby(grouping2, group_keys=False).apply(is_unique_crash) & 
        
               ~df.crash_key.isna())

The error complains that the size of the new column does not align with the dataframe.
I don't remember the exact detail of these lines, but to further debug this, a simple way is to separate this complex statement into smaller ones, and print out the result matrix of each step.

E.g., is this a column mismatch in the result you pasted above?

INFO:root:experiment_df:
...
[3 rows x 12 columns]
WARNING:root:Filtered out invalid benchmarks: set().
...
[3 rows x 12 columns]
Int64Index([0, 1, 2], dtype='int64')
...
[3 rows x 13 columns]

Also, I would fix the 1st error and re-run the exp to check if the 2nd one still exists, just in case it is caused by the 1st.

DonggeLiu · 2023-12-23T11:13:36Z

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus aflplusplus_407 --mutation-analysis

This failed because there is no fuzzer named aflplusplus_407.
May be it was not included in this PR?

phi-go · 2023-12-23T11:18:31Z

So presubmit error seems to be this:

"gsutil cp /tmp/tmplzfk1uuo gs://experiment-data/test-experiment/build-logs/benchmark-freetype2_ftfuzzer-mutation_analysis.txt" returned: 1. Extras: 
    output: ServiceException: 401 Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).

gs://experiment-data should be the same bucket as in main, so I'm not sure what caused this ...

Regarding openh264, I have merged the main branch but I see now that the fix is not in there, let me add the fix from the pull request.

This failed because there is no fuzzer named aflplusplus_407.
May be it was not included in this PR?

I just copied the command assuming those fuzzers existed in this branch, it doesn't really matter which fuzzer the experiment is run with, we can also just do one.

Thank you for the comments on the errors, we won't have time this weekend but will look into this starting Monday.

DonggeLiu · 2023-12-23T11:21:16Z

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus --mutation-analysis

DonggeLiu · 2023-12-23T11:30:13Z

So presubmit error seems to be this:

"gsutil cp /tmp/tmplzfk1uuo gs://experiment-data/test-experiment/build-logs/benchmark-freetype2_ftfuzzer-mutation_analysis.txt" returned: 1. Extras: 
    output: ServiceException: 401 Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).

gs://experiment-data should be the same bucket as in main, so I'm not sure what caused this ...

This is because your statement requires gsutil actions, but CI does not authenticate hence cannot perform that action.

Could that statement be replaced by mocks?

DonggeLiu · 2023-12-23T11:31:13Z

Experiment 2023-12-23-mua-measurer data and results will be available later at:
The experiment data.
The experiment report.

phi-go · 2023-12-23T11:38:01Z

This is because your statement requires gsutil actions, but CI does not authenticate hence cannot perform that action.

Could that statement be replaced by mocks?

How does this test work for the coverage measurer? As far as I can see there is no fixture setting up the data for the coverage measurer. We just added the mua commands (that require gsutil) there so that this test prepares the environment for the mua measurer correctly but it seems there is no similar setup needed for the coverage measurer. We can of course also try to use a mocking instead.

DonggeLiu · 2023-12-23T11:54:28Z

How does this test work for the coverage measurer?

Don't know, I would have to read the code to learn : )

phi-go · 2024-01-03T07:54:56Z

Hey, the 2023-01-02-mua-xml2-2f-bug actually got through that part and did some mua evaluation. You can stop the run, it's missing some performance optimizations so might take quite long to actually complete.

phi-go · 2024-01-03T16:45:31Z

Also could you give it another go, I did some more performance optimizations that should have improved things:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-01-03-mua-xml2-2f --fuzzers afl libfuzzer --benchmarks libxml2_xml --mutation-analysis

DonggeLiu · 2024-01-03T22:33:11Z

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-01-04-mua-xml2-2f --fuzzers afl libfuzzer --benchmarks libxml2_xml --mutation-analysis

DonggeLiu · 2024-01-03T22:48:35Z

Experiment 2023-01-04-mua-xml2-2f data and results will be available later at:
The experiment data.
The experiment report.

phi-go · 2024-01-08T08:27:46Z

Seems like 2023-01-04-mua-xml2-2f ran through and the mua evaluation at least completed for AFL. Also the result DBs are too large. We had a local run that had the same issues and implemented fixes. We now had some successful larger local runs and I think we can now either have one more run here or start a first test run for the competition: #1941

You might want to delete the data for 2023-01-04-mua-xml2-2f it is a few hundred GBs.

DonggeLiu · 2024-01-10T02:03:54Z

You might want to delete the data for 2023-01-04-mua-xml2-2f it is a few hundred GBs.

Done, thanks for the reminder!

fix typo

f9e7003

DonggeLiu added the SBFT24 label Sep 25, 2023

phi-go and others added 17 commits September 26, 2023 13:30

add mutation_analysis image with gllvm

aaa049c

[WIP] mua framework integration

e90cd53

[WIP] update-alternatives in mua image

a0e2b1f

[WIP] integrate mutation analysis into fuzzbench, build mua image

aeee7a6

[WIP] mua integration fuzzbench

e994f65

[WIP] mua integration recover changes

ffbf1e4

fuzzbench integration: build mutants

40cad5e

add mua command line option

396140e

move mua code from local only to local+gcb

b070a60

add process_mua

f1e8c26

Co-authored-by: Philipp Görz <phi-go@users.noreply.github.com>

faster starts for dev

8a76d1f

pass presubmit checks

e2768cb

Merge branch 'google:master' into sbft

6f731ff

impl build_mua for gcb_build (untested)

02039ab

Merge branch 'google:master' into sbft

340244b

Merge branch 'google:master' into sbft

67d685d

changes fixing performance and cloud issues

07a45f5

phi-go added 4 commits January 3, 2024 13:10

improve mua run perf

0702bbb

dispatcher local and mua wait for trials

7f7b67e

pass presubmit

fe9fd38

extra logging for local run

72458de

vandaltool and others added 7 commits January 4, 2024 12:24

fix libFuzzingEngineMutation support for openssl

392b981

fix libFuzzingEngineMutation.a not compiled lib bug

c879b43

fix pip not found problem + mua version bump

724cbe0

set main.cc as FUZZER_LIB again

a69b5c7

mua run perf improvement and compress result dbs

6779753

pass presubmit and update mua_fuzzer_bench version

59c41ae

ulimit in mua run and better error logs

72ed9c6

improve mua startup and resource allocation

11da652

phi-go and others added 12 commits January 10, 2024 15:37

perf improvements

cc85e8b

less spam in mua runs

1c5a5a6

overwrite results db on store

d078a9d

only eval median trial

a126e95

less spam when waiting on trials

93ef805

Integrate mutation testing into fuzzbench reporting

5239d94

pass presubmit

29539db

fix running without mutation-analysis flag

fcf342c

set merge_with_nonprivate to false

72926c0

store build logs for local runs

47af5a9

allow build scripts compiling same binary twice

b8672c8

allow unlimited log size for gcb_build

0452521

Add new measurer based on Mutation Analysis #1901

Are you sure you want to change the base?

Add new measurer based on Mutation Analysis #1901

Uh oh!

Conversation

phi-go commented Sep 22, 2023

Uh oh!

google-cla bot commented Sep 22, 2023

Uh oh!

phi-go commented Dec 23, 2023

Uh oh!

DonggeLiu commented Dec 23, 2023

Uh oh!

DonggeLiu commented Dec 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DonggeLiu commented Dec 23, 2023

Uh oh!

phi-go commented Dec 23, 2023

Uh oh!

DonggeLiu commented Dec 23, 2023

Uh oh!

DonggeLiu commented Dec 23, 2023

Uh oh!

DonggeLiu commented Dec 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phi-go commented Dec 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DonggeLiu commented Dec 23, 2023

Uh oh!

phi-go commented Jan 3, 2024

Uh oh!

phi-go commented Jan 3, 2024

Uh oh!

DonggeLiu commented Jan 3, 2024

Uh oh!

DonggeLiu commented Jan 3, 2024

Uh oh!

phi-go commented Jan 8, 2024

Uh oh!

DonggeLiu commented Jan 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DonggeLiu commented Dec 23, 2023 •

edited

Loading

DonggeLiu commented Dec 23, 2023 •

edited

Loading

phi-go commented Dec 23, 2023 •

edited

Loading