Skip to content

Conversation

phi-go
Copy link

@phi-go phi-go commented Sep 22, 2023

We recently published a paper describing an approach to use mutation analysis for fuzzer benchmarking. This pull request aims to support this approach (Phase I only) for FuzzBench.


There are a few mismatches between FuzzBench and our framework that would need some solutions, while I have some ideas, I would be interested in your input as well!

My understanding of measurers for FuzzBench is as follows:
At the start of a run all docker images are built. Specifically, for the coverage measurer the generated.mk contains targets of the form build-coverage-{benchmark} which build the images containing executables used for coverage collection. These make commands are started in the builder.py, and after the image is built the coverage executables are extracted and stored on the host.
The coverage executables are then used by the code in the measure_manager.py to measure unmeasured_snapshots, which I would expect to be newly created fuzzer queue entries. Coverage measurement happens on the host in a temporary directory as far as I can tell, with a timeout of 900 seconds for a snapshot.

Open points:

  • Our framework requires a single bitcode file of intermediate same as the object file that would be passed to the fuzzer's compiler. We usually use wllvm/gllvm to get this bitcode file. Though there are other methods, this has worked quite well if the build scripts respect setting CC/CXX. Hopefully, this also works for FuzzBench.
  • In our framework we use coverage to decide on which mutant binaries to compile (we use Supermutants but I would ignore this step for now). However, the continuous measurements while running FuzzBench would require either compiling all mutant binaries at the beginning or on demand, once we see a mutation is covered. The first option, I would like to avoid as this will cause compilation of large amounts of mutants that might not even be used. The second option would require keeping around a builder image for each benchmark as well as some synchronization to avoid compiling multiple times. Also long compilation times can delay measurements of snapshots. Doing measurements for the mutation score at the end would also be a possibility, though it would require keeping around all snapshots.
  • Measuring if a mutant is detected needs to be done individually, by the nature of mutation analysis, infinite loops, high memory usage and crashes are expected to happen. To not mess with the host system it might also be sensible to execute the mutant executables in a container.

@google-cla
Copy link

google-cla bot commented Sep 22, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@phi-go
Copy link
Author

phi-go commented Dec 23, 2023

Hey @alan32liu, the current commit should hopefully pass the CI checks, though, we are still encountering some errors for the integration tests with our setup but I believe this is just caused by the differences in the environment. We have merged the fixes for #1937, thank you for your help there. The current commit is still a work in progress regarding report generation and currently only 10 mutant binaries are built to keep testing times low.

We added the more granular timestamps as well, see changes to the extract_corpus() and archive_corpus() functions.

If the CI checks pass could you start a gcbrun? I think the following command should be correct:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus aflplusplus_407 --mutation-analysis

We also encounter two errors that we are quite confused by, maybe you have an idea?


One is part of the integration tests:
Here: experiment/test_runner.py::TestIntegrationRunner::test_integration_runner fails for the following assert:
assert len(os.listdir(output_corpus_dir)) > 5
The output_corpus_dir instead contains a corpus and crashes dir, where the corpus dir does contain more than 5 files. The fuzzer-log.txt contains the following warnings, could these explain the issue?:

[run_fuzzer] Running command: /home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest -print_final_stats=1 -close_fd_mask=3 -fork=1 -ignore_ooms=1 -ignore_timeouts=1 -ignore_crashes=1 -entropic=1 -keep_seed=1 -cross_over_uniform_dist=1 -entropi>
WARNING: unrecognized flag '-fork=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-ignore_ooms=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-ignore_timeouts=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-ignore_crashes=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-entropic=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-keep_seed=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-cross_over_uniform_dist=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-entropic_scale_per_exec_time=1'; use -help=1 to list all flags
INFO: Seed: 3421389961 INFO: Loaded 1 modules   (1645 inline 8-bit counters): 1645 [0x48ef70, 0x48f5dd),
INFO: Loaded 1 PC tables (1645 PCs): 1645 [0x472f58,0x479628),
INFO:        0 files found in /tmp/pytest-of-pgoerz/pytest-51/test_integration_runner0/corpus/corpus
INFO:        0 files found in /tmp/pytest-of-pgoerz/pytest-51/test_integration_runner0/seeds
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus #2
      INITED cov: 4 ft: 5 corp: 1/1b lim: 4 exec/s: 0 rss: 23Mb
        NEW_FUNC[1/1]: ==458257==WARNING: invalid path to external symbolizer! ==458257==
WARNING: Failed to use and restart external symbolizer! 0x450a10  (/home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest+0x450a10) #2062
   NEW    cov: 7 ft: 8 corp: 2/21b lim: 21 exec/s: 0 rss: 26Mb L: 20/20 MS: 5 InsertRepeatedBytes-CMP-InsertByte-InsertRepeatedBytes-InsertByte- DE: "\x01\x00\x00\x00\x00\x00\x00\x14"-
        NEW_FUNC[1/4]: 0x450aa0  (/home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest+0x450aa0)
        NEW_FUNC[2/4]: 0x450b30  (/home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest+0x450b30) #2240
   NEW    cov: 17 ft: 18 corp: 3/41b lim: 21 exec/s: 0 rss: 26Mb L: 20/20 MS: 3 CopyPart-CMP-CopyPart- DE: "\x01\x00"-

The other happens during report generation, which we tried to debug but are not even sure what of our changes could have even caused it:

INFO:root:experiment_df:
                                   git_hash  experiment_filestore  ... fuzzer_stats crash_key
0  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
1  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
2  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None

[3 rows x 12 columns]
WARNING:root:Filtered out invalid benchmarks: set().
                                   git_hash  experiment_filestore  ... fuzzer_stats crash_key
0  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
1  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
2  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None

[3 rows x 12 columns]
Int64Index([0, 1, 2], dtype='int64')
                                   git_hash  experiment_filestore  ... crash_key firsts
0  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...      None   True
1  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...      None   True
2  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...      None   True

[3 rows x 13 columns]
Int64Index([0, 1, 2], dtype='int64')
ERROR:root:Error generating HTML report. Extras:
    traceback: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 11003, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
  File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 324, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4807, in reindex
    return super().reindex(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/generic.py", line 4966, in reindex
    return self._reindex_axes(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4626, in _reindex_axes
    frame = frame._reindex_index(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4642, in _reindex_index
    new_index, indexer = self.index.reindex(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 4237, in reindex
    target = self._wrap_reindex_result(target, indexer, preserve_names)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 2520, in _wrap_reindex_result
    target = MultiIndex.from_tuples(target)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 204, in new_meth
    return meth(self_or_cls, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 559, in from_tuples
    arrays = list(lib.tuples_to_object_array(tuples).T)
  File "pandas/_libs/lib.pyx", line 2930, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/work/src/experiment/reporter.py", line 82, in output_report
    generate_report.generate_report(
  File "/work/src/analysis/generate_report.py", line 269, in generate_report
    experiment_df = data_utils.add_bugs_covered_column(experiment_df)
  File "/work/src/analysis/data_utils.py", line 170, in add_bugs_covered_column
    df['firsts'] = (
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3645, in __setitem__
    self._set_item_frame_value(key, value)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3787, in _set_item_frame_value
    arraylike = _reindex_for_setitem(value, self.index)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 11010, in _reindex_for_setitem
    raise TypeError(
TypeError: incompatible index of inserted column with frame index

@DonggeLiu
Copy link
Contributor

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus aflplusplus_407 --mutation-analysis

@DonggeLiu
Copy link
Contributor

DonggeLiu commented Dec 23, 2023

We also encounter two errors that we are quite confused by, maybe you have an idea?

I will list my initial thoughts below, but I did not have a chance to read your code thoroughly and may make mistakes.
@jonathanmetzman please correct me if I am wrong.

One is part of the integration tests: Here: experiment/test_runner.py::TestIntegrationRunner::test_integration_runner fails for the following assert: assert len(os.listdir(output_corpus_dir)) > 5 The output_corpus_dir instead contains a corpus and crashes dir, where the corpus dir does contain more than 5 files. The fuzzer-log.txt contains the following warnings, could these explain the issue?:

It seems the corpus/ was not correctly defined by your fuzzer (BTW, would it be better to change the copyright info?).
Here is an example from libfuzzer:

# Seperate out corpus and crash directories as sub-directories of
# |output_corpus| to avoid conflicts when corpus directory is reloaded.
crashes_dir = os.path.join(output_corpus, 'crashes')
output_corpus = os.path.join(output_corpus, 'corpus')


The other happens during report generation, which we tried to debug but are not even sure what of our changes could have even caused it:

Traceback (most recent call last):
File "/work/src/experiment/reporter.py", line 82, in output_report
generate_report.generate_report(
File "/work/src/analysis/generate_report.py", line 269, in generate_report
experiment_df = data_utils.add_bugs_covered_column(experiment_df)
File "/work/src/analysis/data_utils.py", line 170, in add_bugs_covered_column
df['firsts'] = (
File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3645, in setitem
self._set_item_frame_value(key, value)
File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3787, in _set_item_frame_value
arraylike = _reindex_for_setitem(value, self.index)
File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 11010, in _reindex_for_setitem
raise TypeError(
TypeError: incompatible index of inserted column with frame index

The corresponding lines were finding the first bug covered by each fuzzer on each benchmark, and adding it as a new column in the original data frame:

df['firsts'] = (
df.groupby(grouping2, group_keys=False).apply(is_unique_crash) &
~df.crash_key.isna())

The error complains that the size of the new column does not align with the dataframe.
I don't remember the exact detail of these lines, but to further debug this, a simple way is to separate this complex statement into smaller ones, and print out the result matrix of each step.

E.g., is this a column mismatch in the result you pasted above?

INFO:root:experiment_df:
...
[3 rows x 12 columns]
WARNING:root:Filtered out invalid benchmarks: set().
...
[3 rows x 12 columns]
Int64Index([0, 1, 2], dtype='int64')
...
[3 rows x 13 columns]

Also, I would fix the 1st error and re-run the exp to check if the 2nd one still exists, just in case it is caused by the 1st.

@DonggeLiu
Copy link
Contributor

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus aflplusplus_407 --mutation-analysis

This failed because there is no fuzzer named aflplusplus_407.
May be it was not included in this PR?

@phi-go
Copy link
Author

phi-go commented Dec 23, 2023

So presubmit error seems to be this:

"gsutil cp /tmp/tmplzfk1uuo gs://experiment-data/test-experiment/build-logs/benchmark-freetype2_ftfuzzer-mutation_analysis.txt" returned: 1. Extras: 
    output: ServiceException: 401 Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).

gs://experiment-data should be the same bucket as in main, so I'm not sure what caused this ...

Regarding openh264, I have merged the main branch but I see now that the fix is not in there, let me add the fix from the pull request.

This failed because there is no fuzzer named aflplusplus_407.
May be it was not included in this PR?

I just copied the command assuming those fuzzers existed in this branch, it doesn't really matter which fuzzer the experiment is run with, we can also just do one.


Thank you for the comments on the errors, we won't have time this weekend but will look into this starting Monday.

@DonggeLiu
Copy link
Contributor

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus --mutation-analysis

@DonggeLiu
Copy link
Contributor

So presubmit error seems to be this:

"gsutil cp /tmp/tmplzfk1uuo gs://experiment-data/test-experiment/build-logs/benchmark-freetype2_ftfuzzer-mutation_analysis.txt" returned: 1. Extras: 
    output: ServiceException: 401 Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).

gs://experiment-data should be the same bucket as in main, so I'm not sure what caused this ...

This is because your statement requires gsutil actions, but CI does not authenticate hence cannot perform that action.

Could that statement be replaced by mocks?

@DonggeLiu
Copy link
Contributor

DonggeLiu commented Dec 23, 2023

Experiment 2023-12-23-mua-measurer data and results will be available later at:
The experiment data.
The experiment report.

@phi-go
Copy link
Author

phi-go commented Dec 23, 2023

This is because your statement requires gsutil actions, but CI does not authenticate hence cannot perform that action.

Could that statement be replaced by mocks?

How does this test work for the coverage measurer? As far as I can see there is no fixture setting up the data for the coverage measurer. We just added the mua commands (that require gsutil) there so that this test prepares the environment for the mua measurer correctly but it seems there is no similar setup needed for the coverage measurer. We can of course also try to use a mocking instead.

@DonggeLiu
Copy link
Contributor

How does this test work for the coverage measurer?

Don't know, I would have to read the code to learn : )

@phi-go
Copy link
Author

phi-go commented Jan 3, 2024

Hey, the 2023-01-02-mua-xml2-2f-bug actually got through that part and did some mua evaluation. You can stop the run, it's missing some performance optimizations so might take quite long to actually complete.

@phi-go
Copy link
Author

phi-go commented Jan 3, 2024

Also could you give it another go, I did some more performance optimizations that should have improved things:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-01-03-mua-xml2-2f --fuzzers afl libfuzzer --benchmarks libxml2_xml --mutation-analysis

@DonggeLiu
Copy link
Contributor

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-01-04-mua-xml2-2f --fuzzers afl libfuzzer --benchmarks libxml2_xml --mutation-analysis

@DonggeLiu
Copy link
Contributor

Experiment 2023-01-04-mua-xml2-2f data and results will be available later at:
The experiment data.
The experiment report.

@phi-go
Copy link
Author

phi-go commented Jan 8, 2024

Seems like 2023-01-04-mua-xml2-2f ran through and the mua evaluation at least completed for AFL. Also the result DBs are too large. We had a local run that had the same issues and implemented fixes. We now had some successful larger local runs and I think we can now either have one more run here or start a first test run for the competition: #1941

You might want to delete the data for 2023-01-04-mua-xml2-2f it is a few hundred GBs.

@DonggeLiu
Copy link
Contributor

You might want to delete the data for 2023-01-04-mua-xml2-2f it is a few hundred GBs.

Done, thanks for the reminder!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants