Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat[test]: implement abi_decode spec test #4095

Merged
merged 66 commits into from
Jun 14, 2024

Conversation

charles-cooper
Copy link
Member

@charles-cooper charles-cooper commented Jun 5, 2024

What I did

add a spec-based differential fuzzer test for abi_decode

How I did it

How to verify it

Commit message

this commit implements a spec-based differential fuzzer for
`abi_decode`.

it introduces several components:

- a "spec" implementation of `abi_decode`, which is how vyper's
  abi_decode should behave on a given payload, implemented in python

- a hypothesis strategy to draw vyper types

- hypothesis strategy to create valid data for a given vyper type

- a hypothesis strategy to _mutate_ a given payload which is designed
  to introduce faults in the decoder. testing indicated splicing
  pointers into the payload - either valid pointers or "nearly" valid
  pointers - had the highest success rate for finding bugs in the
  decoder. the intuition here is that the most difficult part of the
  decoder is validating out-of-bound pointers in the payload, so
  pointers represent "semantically high-value" data to the fuzzer.

- some hypothesis tuning to ensure a good distribution of types

over several days of testing+tuning, this fuzzer independently found
the bugs fixed in 44bb281ccaa and 21f7172274e (which were originally
found by manual review).

Description for the changelog

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

simplify imports with import hypothesis as hp
the large + nested sarrays just end up producing contracts that break
the contract size limit

# add, edit, delete, word, splice, flip
possible_actions = "adwww"
actions = draw(st.lists(st.sampled_from(possible_actions), max_size=MAX_MUTATIONS))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can also generate an empty list, no? Don't we want min_size=1?

Copy link
Member Author

@charles-cooper charles-cooper Jun 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to test valid payloads too, we don't have to mutate the payload

Copy link
Collaborator

@cyberthirst cyberthirst Jun 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i assumed we'd test valid payloads separately. we could enmurate all type combinations up to certain depth

st_byte = st_any_byte

# add, edit, delete, word, splice, flip
possible_actions = "adwww"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we deliberately don't use the rest of the actions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I was going to clean it up a bit. The edit, splice and flip actions turned out to not be very useful.



@st.composite
def _mutate(draw, payload, max_mutations=MAX_MUTATIONS):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_mutations unused

static_leaf_ctors = [t for t in leaf_ctors if t._is_prim_word]
dynamic_leaf_ctors = [BytesT, StringT]

MAX_MUTATIONS = 33
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reasoning behind using 33?
why is it ok to use 33 for all payloads, irrespective of their lengths? ie why do we assume this works well both for short and long payloads?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about editing, you can add or delete one word (plus change)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we bias towards the w action, which is word-level

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, i'm not sure about the byte-level mutation. if it happens on the length word, it will likely create an invalid length. if it happens on a pointer, it will likely create an invalid pointer (eg points to a too large address)

given the fairly low MAX_MUTATIONS if a bad byte mutation happens, I think there's a low probability of offsetting it with a "good" mutation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like normal fuzzers do it on this level, but i think that the mutation depth there is much higher

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's mostly useful for editing the "data" portion of the payload, like it produces off-by-ones (expected length is 32 but data is only actually 31)


if t in (BytesT, StringT):
# arbitrary max_value
bound = draw(st.integers(min_value=1, max_value=1024))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't 1024 unnecessarily large so it just slows down the tests? what cases might not be covered by eg 512?

@cyberthirst
Copy link
Collaborator

i think that for the sake of completeness, we should also test with dirty memory

let's consider that a ptr points outside the buffer to
a) dirty memory
b) clean memory

we currently only test for b) and require that the spec matches the implementation

the spec should always raise when ptr points outside the buffer. but what if the implementation doesn't revert when pointing outside the buffer to dirty memory?

ret.pop(ix)
elif action == "w":
# splice word
st_uint256 = st.integers(min_value=0, max_value=2**256 - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure how useful this range is. the interesting values will be on it's boundaries but we can sample those for "cheaper"

in most cases we'll just sample some gigantic random number and i'd argue it will just hit the same code path each time (and for ptrs will mostly likely just oog)

@cyberthirst
Copy link
Collaborator

i think it's important to include structs - one of the original errors was incorrect validation of structs within dynarrays

@charles-cooper
Copy link
Member Author

i think it's important to include structs - one of the original errors was incorrect validation of structs within dynarrays

I mean structs share the same code path as tuples, but with named fields in the front-end. I don't think we are missing much by not including them

@cyberthirst
Copy link
Collaborator

i think it's important to include structs - one of the original errors was incorrect validation of structs within dynarrays

I mean structs share the same code path as tuples, but with named fields in the front-end. I don't think we are missing much by not including them

we skip them for dynarrays, right?

@charles-cooper
Copy link
Member Author

i think it's important to include structs - one of the original errors was incorrect validation of structs within dynarrays

I mean structs share the same code path as tuples, but with named fields in the front-end. I don't think we are missing much by not including them

we skip them for dynarrays, right?

Ah right, those are disallowed by the language semantics. Ok let's add them

@charles-cooper charles-cooper changed the title wip - implement abi-decode spec feat[test]: implement abi_decode spec test Jun 14, 2024
Comment on lines +342 to +343
# for k, v in asdict(stats).items():
# event(k, v)

Check notice

Code scanning / CodeQL

Commented-out code Note test

This comment appears to contain commented-out code.
@charles-cooper charles-cooper marked this pull request as ready for review June 14, 2024 20:16
@charles-cooper charles-cooper merged commit 69e5c05 into vyperlang:master Jun 14, 2024
156 checks passed
@charles-cooper charles-cooper deleted the test-abi-decode-spec branch June 14, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants