ILF is an Imitation Learning based Fuzzer for smart contracts. The fuzzing policy, which is used to generate transactions, is represented by an ensemble of neural networks and is learned from thousands of high-quality sequences of transactions generated using symbolic execution. ILF can be used to fuzz any Ethereum smart contract and outputs the coverage and a vulnerability report.
ILF is developed at SRI Lab, Department of Computer Science, ETH Zurich as part of the Machine Learning for Programming and Blockchain Security projects. For mode details, please refer to ILF CCS'19 paper and slides.
We provide a docker file, which we recommend to start with. To build and run:
$ docker build -t ilf .
$ docker run -it ilf
You can also follow the instructions in the Dockerfile to install ILF locally. If you experience build errors on Apple M chips, please refer to #21.
To fuzz the example provided in the repo with ILF (the imitation
fuzzing policy) using our pre-trained model in the model
directory:
$ python3 -m ilf --proj ./example/crowdsale/ --contract Crowdsale --fuzzer imitation --model ./model/ --limit 2000
The --fuzzer
argument can be replaced by:
random
: a uniformly random fuzzing policy.symbolic
: a symbolic execution fuzzing policy based on depth first search of block states. This is used for generating training sequences.sym_plus
: an augmentation ofsymbolic
which can revisit encountered block states.mix
: a fuzzing policy that randomly choosesimitation
orsymbolic
for generating each transaction.
For fuzzing new contracts, one needs to provide a Truffle project (formatted as the example in example/crowdsale
). Then the script script/extract.py
should be called to extract deployment transactions of the contracts. For the example contract, the script runs as follows:
$ rm example/crowdsale/transactions.json
$ python3 script/extract.py --proj example/crowdsale/ --port 8545
Note that you need to kill existing ganache-cli
processes listening the same port before calling this script.
For training, one needs to run symbolic
on a set of training contracts to produce a dataset in a training directory. Usually tens of thousands of contracts are used for training. For demonstration purposes, we show how to produce a small training dataset from our example contract to the train_data
directory:
$ mkdir train_data
$ python3 -m ilf --proj ./example/crowdsale/ --contract Crowdsale --limit 2000 --fuzzer symbolic --dataset_dump_path ./train_data/crowdsale.data
Run the scripts to select seed integer values and amount values from the training dataset, and put them into ilf/fuzzers/imitation/int_values.py
and ilf/fuzzers/imitation/amounts.py
, respectively:
$ python3 script/get_int_values.py --train_dir ./train_data
$ python3 script/get_amounts.py --train_dir ./train_data
Then the following command performs neural network training and outputs the trained networks in the new_model
directory:
$ mkdir new_model
$ python3 -m ilf --fuzzer imitation --train_dir ./train_data --model ./new_model
For evaluation and training purposes, one might want to automatically construct Truffle projects from a large set of contracts. To achieve this, one can write a script to automatically produce files required by Truffle projects, following the format in example/crowdsale
. The compressed file truffle_scripts.tar.gz
contains the scripts we used. Those scripts might not run directly but can give you a high level idea how things work.
@inproceedings{He:2019:LFS:3319535.3363230,
author = {He, Jingxuan and Balunovi\'{c}, Mislav and Ambroladze, Nodar and Tsankov, Petar and Vechev, Martin},
title = {Learning to Fuzz from Symbolic Execution with Application to Smart Contracts},
booktitle = {Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security},
series = {CCS '19},
year = {2019},
isbn = {978-1-4503-6747-9},
location = {London, United Kingdom},
pages = {531--548},
numpages = {18},
url = {http://doi.acm.org/10.1145/3319535.3363230},
doi = {10.1145/3319535.3363230},
acmid = {3363230},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {fuzzing, imitation learning, smart contracts, symbolic execution},
}
- Jingxuan He
- Mislav Balunović
- Nodar Ambroladze
- Petar Tsankov
- Martin Vechev
- Anton Permenev
- Copyright (c) 2019 Secure, Reliable, and Intelligent Systems Lab (SRI), ETH Zurich
- Licensed under the Apache 2.0 License