Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: some upstream test fixtures fail when updating to v7.0.0-beta.1 #1857

Closed
veox opened this issue Oct 4, 2019 · 10 comments
Closed

Comments

@veox
Copy link
Contributor

veox commented Oct 4, 2019

This issue is to track findings on why upstream fixtures fail in PR #1852, so I don't trash the PR discussion.

Working in update-fixtures-to-v7.0.0-beta.1 branch of mine, rolling off the one in PR #1852.


There are 6 failures for --fork Istanbul, all in fixtures/BlockchainTests/GeneralStateTests/stRevertTest:

RevertPrecompiledTouch.json:RevertPrecompiledTouch_d0g0v0_Istanbul
RevertPrecompiledTouch.json:RevertPrecompiledTouch_d3g0v0_Istanbul
RevertPrecompiledTouch_storage.json:RevertPrecompiledTouch_storage_d0g0v0_Istanbul
RevertPrecompiledTouch_storage.json:RevertPrecompiledTouch_storage_d3g0v0_Istanbul
RevertPrecompiledTouchExactOOG.json:RevertPrecompiledTouchExactOOG_d7g2v0_Istanbul
RevertPrecompiledTouchExactOOG.json:RevertPrecompiledTouchExactOOG_d31g2v0_Istanbul

These were added in ethereum/tests#580, then updated/refilled in ethereum/tests#609 and ethereum/tests#610.

For the first four, data indices 0 and 3 produce the same result upstream, so it's no surprise that 0 and 3 fail while 1 and 2 do not.

The last two are also grouped upstream.

@veox
Copy link
Contributor Author

veox commented Oct 4, 2019

RevertPrecompiledTouch.json has program of the following form:

{
 (CALL 50000 1 0 0 0 0 0)
 (CALL 50000 2 0 0 0 0 0)
 (CALL 50000 3 0 0 0 0 0)
 (CALL 50000 4 0 0 0 0 0)
 (CALL 50000 5 0 0 0 0 0)
 (CALL 50000 6 0 0 0 0 0)
 (CALL 50000 7 0 0 0 0 0)
 (CALL 50000 8 0 0 0 0 0)
 [[1]] (GAS)
 [[2]] (GAS)
 [[3]] (GAS)}

That is, sequentially call precompiles 1 through 8, then store remaining gas 3 times in this contract's storage.

All the precompiles have nonce set to 0.

The comment shows that it's expected to OOG-fail.

Variants 0 through 3 have CALL, DELEGATECALL, CALLCODE, STATICCALL as methods of calling the precompiles. So, it's CALL and STATICCALL that py-evm and geth have a disagreement on.

Variants 0 and 3 clear the nonce on precompile 3 (RIPEMD160).


Candidate 1: the difference between py-evm and geth (which filled the tests) is due to different gas use.

py-evm trace shows tx fee of 98780; haven't found number for geth yet.


Candidate 2: very telling is the fact that variants 0 and 3 of the test say that precompile 3 (RIPEMD160) should get its nonce cleared, and therefore should no longer exist after execution. RIPEMD160 is a special cookie...

@veox
Copy link
Contributor Author

veox commented Oct 4, 2019

Relating to candidate 2: ethereum/EIPs issue ethereum/EIPs#716 ("Clarification about when touchedness is reverted during state clearance").


This all sounds very familiar, only this time it's just the RIPEMD160 precompile.


From ethereum/aleth#5664 (good summary with lots of links):

The buggy behavior that is now a rule is basically: when precompiled contract with address 0x03 (RIPEMD160) is touched, then exception happens and state changes are reverted, this address should be still considered touched in the end, so that "remove empty touched accounts" rule (EIP-158) deletes it from the state at the end of transaction.

To clarify: when all the conditions in italics are true, then bold should still happen.


We do try to catch this, here and a bit above, too:

# collect account directly addressed
if computation.msg.to != constants.CREATE_CONTRACT_ADDRESS:
if computation.is_error and computation.is_origin_computation:
# Special case to account for geth+parity bug
# https://github.com/ethereum/EIPs/issues/716
if computation.msg.to == THREE:
yield computation.msg.to
else:
yield computation.msg.to

... but it seems that geth is not as stringent as py-evm on when this should happen. We only do it if computation.is_origin_computation. 😞

@veox
Copy link
Contributor Author

veox commented Oct 4, 2019

So, in short:

In collect_touched_accounts(), we now have to also check when a computation touches RIPEMD160, succeeds itself, but the parent (or someone along the call ladder) fails, and the failee is not necessarily the origin computation.

This is because we recurse into nested computations if the current one is successful:

# recurse into nested computations if this one was successful
if not computation.is_error:
for child in computation.children:
yield from collect_touched_accounts(child)

... but now we have to do it regardless, just to see if someone crumbled RIPEMD160.

I can't figure out how to do this without a double-pass through the nested computations, or ugly hacks - not today, at least, and got to run anyway.

@veox
Copy link
Contributor Author

veox commented Oct 5, 2019

This commit fixes the first four failing cases.

RevertPrecompiledTouchExactOOG still fail - investigating.

@veox
Copy link
Contributor Author

veox commented Oct 5, 2019

For test cases

RevertPrecompiledTouchExactOOG.json:RevertPrecompiledTouchExactOOG_d7g2v0_Istanbul
RevertPrecompiledTouchExactOOG.json:RevertPrecompiledTouchExactOOG_d31g2v0_Istanbul

insufficient (69K) gas is provided, with the expectation that they should OOG-fail.

The precompile called, 0x08 (ECPAIRING), should remain as-is, whereas if 120K gas is provided, its storage should be cleared (highlighted in same link).

This seems consistent to me.

The call that OOG-fails is nested, as shown in the trace from py-evm. On exit from that, remaining gas is enough to finish the parent and grand-parent computations.

py-evm then proceeds to clear 0x08; this is incorrect.


Solution: need to relax rules further, and not consider computation.is_origin_computation in one of the places. Will push commit shortly.


Implemented in last three commits of work branch.


Diff of changes to file, to cover both types of failures described.

@veox
Copy link
Contributor Author

veox commented Oct 7, 2019

Slowest tests: https://gist.github.com/veox/02336c518f049cb749608657622d127f

Also includes shoddy napkin-quality scripts to generate the SLOWEST_TESTS blob.


Seems that the initial sampling was too limited, printing only 200 of the slowest durations, which skewed the results heavily on Istanbul's side. Re-running...


Re-ran with 10000 as cut-off, same result. The list at test is almost entirely populated with Istanbul entries.

Still, tox jobs for Petersburg and below run in 1:38 or less; tox job for Istanbul runs in 4:37. Guess that's just how it's going to be.

@veox
Copy link
Contributor Author

veox commented Apr 26, 2020

Closing since PR #1858 was superseded by PR #1871 and addressed this; the issue was left hanging.


JIC, I was reminded of this because a discussion arose recently on gitter of geth (!) open-ethereum failing to produce an expected result for @adria0, for case RevertPrecompiledTouchExactOOG_d30g2v0_Istanbul in particular.

That said, I don't remember if py-evm had issues with the d30g2v0 variant... Re-reading the issue, seems no: it was d31g2v0 for py-evm.

The sherlocking is now split across several issues/PRs pertaining to different network upgrades, and tests filled by two different implementations.

So, just dropping this here for future archeologists.

Pinging @holiman, too, as participant in gitter discussion. (There's likely no actionables here; just a notification.)

@adria0
Copy link

adria0 commented Apr 26, 2020

@veox, finally I found the way to fix it openethereum/openethereum@74eabcc.

In short, this test has a preexistent 0000...0008 in the account trie with zero gas. The test makes a call/staticcall to this address, so since this empty account is touched, the account should be removed from the trie. But, the call does not have enough gas to call the precompile, so it reverts, and as EIP161 says that empty account deletions are reverted when the state is reverted.

OE was not "reverting the deletions", so there was a problem in the state trie root.

Take care, with the current gas parameters defined in the RevertPrecompiledTouchExactOOG.json, some of the tests do not OOG (needs to be fixed) and this cause some confusion when debugging the test cases.

@veox
Copy link
Contributor Author

veox commented Apr 26, 2020

But, the call does not have enough gas to call the precompile, so it reverts, and as EIP161 says that empty account deletions are reverted when the state is reverted.

... Except if it's pre-compile number 0x3, in which case the deletion should be left as-is, including in cases when the parent computation eventually fails anyway, whether it's the origin computation or not; unless the test-filling implementation changes, as this github issue has shown, then do as Simon says, and don't be the last to find a chair, because it's not there!..

Toodly-doo!.. 🎷

In all seriousness: always mention the exception.


This is all OT to the issue(s) you've found; still!

<rant>

A "proper" way to address the above would be to have an irregular state transition (IST) to "right" this previous "accidental" IST that we let slip through, oh because it was a secondary failure in an already debilitating net-wide consensus failure.

The common reasoning against such "right the wrong" ISTs is "the tech debt only increases with that"; to which I can only reply that the cognitive debt keeps piling on, with each new node implementation, with each new developer who is about to find out the hard way. 🤯

Doesn't matter if you start from a snapshot, doesn't matter if you're running a testnet. I swear, we'll take this RIPEMD160 ghost to Eth 2.0 with us, as an annoying weird uncle to Casper. 😄

Perhaps I should write an EIP for that, if only to be filed as a formal complaint. Maybe next March 32nd, instead of dealing with integer overflows. Remind me a few days in advance if you could, please!

Or "would have been", I gather, considering this is years-old.

</rant>

@adria0
Copy link

adria0 commented Apr 27, 2020

... Except if it's pre-compile number 0x3, in which case the deletion should be left as-is, including in cases when the parent computation eventually fails anyway, whether it's the origin computation or not; unless the test-filling implementation changes, as this github issue has shown, then do as Simon says, and don't be the last to find a chair, because it's not there!..

Haha, it has been an interesting issue to fix, to be the first one 😅

I agree with you that this TriePrunning/0x3 specific issue maybe is one of the cases that deserve an IST, because it happens that the embedded logic to fix this irregular stuff is also deployed to other blockchains, not only mainnet. I put on the agenda to ping you on next March 32 (with integer overflows 😜), hehe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants