Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code storage by hash v2 #6505

Merged
merged 7 commits into from
Feb 7, 2024
Merged

Conversation

jframe
Copy link
Contributor

@jframe jframe commented Jan 31, 2024

PR description

This is a redo of #5889 including a fix for detection of existing storage.


Storing code by hash instead of by address. This is needed so that Besu can serve snapsync as we need to be able to retrieve the code by hash.

After syncing with this change it's not possible to downgrade to an older version of Besu. If we think this is needed I could add a subcommand to do the conversion.

Testing

  • Checkpoint sync on Sepolia, Goerli and Mainnet
  • Confirmed the same code bytes were stored for both column families by running a comparison tool on the stored code bytes https://github.com/hyperledger/besu/pull/6019/files#diff-9c4d660b9f0b067bbaf230c40ff79f2a53fef78e88c0c5e368b43e3891d0f68bR227
  • Compatibility testing to ensure that existing databases continue to work
  • Compared storage of code size on mainnet. Before: 9 GiB After: 2GiB. Column family to store code counts i.e. CODE_HASH_COUNT is using 38 MiB.
  • Compared FCU and new payload times on mainnet with and without code changes showing no significant difference in times
Screenshot 2023-10-16 at 10 07 15 am

Add a storage mode for Bonsai that can store the code by hash instead of account hash

Signed-off-by: Jason Frame <jason.frame@consensys.net>
Signed-off-by: Jason Frame <jason.frame@consensys.net>
Copy link

github-actions bot commented Jan 31, 2024

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • I thought about the changelog and included a changelog update if required.
  • If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

@fab-10
Copy link
Contributor

fab-10 commented Jan 31, 2024

Hi @jframe, since this PR changes the format, I think we need to introduce a new version in the database metadata.
When I introduced the variables storage, I didn't introduce a new version, an that was an error, so I am now doing that, and with the occasion, refactoring the way we store database metadata to be more easy to read, so I suggest that we synchronize on this and we can introduce a new version and so leverage the existing protection mechanism.

@jframe
Copy link
Contributor Author

jframe commented Jan 31, 2024

Hi @jframe, since this PR changes the format, I think we need to introduce a new version in the database metadata. When I introduced the variables storage, I didn't introduce a new version, an that was an error, so I am now doing that, and with the occasion, refactoring the way we store database metadata to be more easy to read, so I suggest that we synchronize on this and we can introduce a new version and so leverage the existing protection mechanism.

I think it is a good idea. Will sync up with you on this. Don't think it needs to go in this PR though as this is experimental and doesn't seem right to bake in database versions as part of an experimental feature. But before it's production ready would like to have database versioning for code storage column family in Bonsai to prevent db compatibility issues.

@jframe
Copy link
Contributor Author

jframe commented Feb 2, 2024

Confirmed on test nodes that restart issue no longer occurs

@jframe jframe marked this pull request as ready for review February 2, 2024 00:05
@siladu siladu added the TeamGroot GH issues worked on by Groot Team label Feb 2, 2024
@jframe jframe requested review from matkt and garyschulte February 2, 2024 00:41

public static boolean isCodeHashValue(final byte[] key, final byte[] value) {
final Hash valueHash = Hash.hash(Bytes.wrap(value));
return Bytes.wrap(key).equals(valueHash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no risk to find a key value that match for codeByHash but it's codeByAddress in reality ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this can happen in reality. To do that would need to be able to predict the address and have that as the runtime code. I've gone through the scenarios for create and create2 and I can't see a case where this could happen. In the case of create it's not possible to predict the address since it hashes the nonce and sender address and applies the etherum address rule. In the case of the create2 it is more interesting since we can control the inputs, but I still don't think it's possible since you would need the initCode to create a contract with the runtime code of the address which is not possible to predict in advance although we can calculate it in the case of create2.

Also would prefer to replace this detection with the database metadata approach once that is available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know when it will be available ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Think it's still a work in progress https://github.com/fab-10/besu/tree/database-metadata-refactor. So might be a little while off.

Copy link
Contributor

@matkt matkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it working fine with a call to resyncWorldstate API RPC ? should be fine just to be sure

@jframe
Copy link
Contributor Author

jframe commented Feb 6, 2024

is it working fine with a call to resyncWorldstate API RPC ? should be fine just to be sure

Tested with resyncWorldState API RPC and it's working fine

Copy link
Contributor

@matkt matkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the tests and the code seems good for me

@jframe jframe enabled auto-merge (squash) February 7, 2024 01:33
@jframe jframe merged commit 2ae6c73 into hyperledger:main Feb 7, 2024
51 checks passed
@jframe jframe deleted the code_storage_by_hash_v2 branch February 7, 2024 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TeamGroot GH issues worked on by Groot Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants