Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVM Speed Improvements for MSTORE #2917

Merged
merged 18 commits into from
Oct 20, 2021
Merged

Conversation

shemnon
Copy link
Contributor

@shemnon shemnon commented Oct 18, 2021

PR Description

More EVM Speed Improvements aimed at improving MSTORE times.

  • Move Memory from raw bytes to byte array
  • Add calls for 32 byte MSB aligned writes
  • Remove incrementProgramCounter and move data into OperationResult

Signed-off-by: Danno Ferrin danno.ferrin@gmail.com

Changelog

More EVM Speed Improvements aimed at improving MLOAD times.
* Move Memory from raw bytes to byte array
* Add calls for 32 byte MSB aligned copies
* Remove `incrementProgramCounter` and move data into OperationResult

Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
@shemnon
Copy link
Contributor Author

shemnon commented Oct 18, 2021

Performance spreadsheet

48% improvement over all test cases, 266% improvement in mstore. Slowest operation is now SHA3/KECCAK

Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
@shemnon shemnon changed the title EVM Speed Improvements for MLOAD EVM Speed Improvements for MSTORE Oct 18, 2021
Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
@shemnon shemnon marked this pull request as draft October 18, 2021 06:03
Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
@shemnon shemnon marked this pull request as ready for review October 18, 2021 16:29
Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
Casting an int out of Bytes can be either ArithmeticException or
IllegalArgumentException, so treat all Runtime exceptions as invalid
destinations.

Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
Trim leading zeros on jump. Since it is unsigned we don't need to worry
about sign-extended negative numbers.

Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
@shemnon shemnon marked this pull request as draft October 18, 2021 19:40
Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
@shemnon shemnon marked this pull request as ready for review October 18, 2021 20:36
@@ -184,7 +164,7 @@ public boolean isValidJumpDestination(final UInt256 destination, final Code code
}
}
}
long targetLong = validJumpDestinations[jumpDestination >> 6];
long targetLong = validJumpDestinations[jumpDestination >>> 6];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stared at this for a bit before I saw the jumpDestination < 0 check on 154.

Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, the only question I have is the potential performance tradeoff between byte[] for faster MSTORE and MutableBytes for read performance since @matkt made that optimization.

@@ -36,17 +37,17 @@
* overflow this. A byte array implementation limits us to 2 GiB. But that would cost over 51
* trillion gas. So this is likely a reasonable limitation, at least at first.
*/
private MutableBytes data;
private byte[] memBytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the benefit of going back to byte[] for MSTORE outweigh the read optimization @matkt did in #2673 ? Are there additional considerations that make immutable byte[] a better choice here?

Copy link
Contributor Author

@shemnon shemnon Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a mix of the two. This goes back to byte[] but does not use the array copying code that was in place before. It directly slices the array into Byte. The code segments that needed to copy bytes are still doing a copy on the resulting byte object. Those were introduced by MutableBytes and not removed, and are actually orthogonal to the backing data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigger issue is that doing right-aligned copying in MutableBytes efficiently requires APIs that don't exist on it.

final BigInteger value0 = new BigInteger(1, frame.popStackItem().toArrayUnsafe());
final BigInteger value1 = new BigInteger(1, frame.popStackItem().toArrayUnsafe());

final BigInteger result = value0.add(value1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we seeing better performance using BigInteger rather than UInt256?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The use of UInt256 incurred a few more byte[]->int[]->byte[] loops. It is a more pronounced improvement in div, mul, and mod as there are less cycles spent moving bytes into new objects.

Copy link
Contributor

@jflo jflo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc on Memory.set32BytesAligned could stand some clarification. It is confusing as written. I've included suggestions inline.

@@ -202,7 +202,7 @@ public Bytes getBytes(final long location, final long numBytes) {
final int start = asByteIndex(location);

ensureCapacityForBytes(start, length);
return data.slice(start, length);
return Bytes.wrap(Arrays.copyOfRange(memBytes, start, start + length));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Tuweni .slice sounds mutable based on their docs, but i may be misreading, sounds ambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be mutable if the underlying bytes are mutable. We got burned on that with #2908.

}

/**
* Copy the bytes from the provided number of bytes from the provided value to memory from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be worded better. Suggest "Copy numBytes bytes from taintedValue to memory at location" or some variant with fewer froms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just re-wrote it to my liking. Initially it was copy-paste from the left aligned variant.

* <p>Note that this method will extend memory to accommodate the location assigned and bytes
* copied and so never fails.
*
* @param location the location in memory at which to start copying the bytes of {@code value}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding "to" at the end of this description helps

* numBytes} (in other words, {@link #getActiveWords()} will return a value consistent with
* having set {@code numBytes} bytes, even if less than that have been concretely set due to
* {@code value} being smaller).
* @param taintedValue the bytes to copy to memory from {@code location}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest s/from/at

if (length > srcSize) {
final MutableBytes paddedAnswer = MutableBytes.create((int) length);
if ((long) 0 < srcSize) {
value.slice(0, (int) ((long) srcSize)).copyTo(paddedAnswer, (int) (length - srcSize));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is casting srcSize to long and then back to int doing when defining the length of the slice being copied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how neither IntelliJ nor Errorprone tagged it. Initially it was because memory was going to be long indexed, until I was satisfied that 33rd bit memory access was unlikely to ever be viable even in perverse circumstances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Sonar flagged it.

@@ -28,6 +30,8 @@

private final int length;

private final OperationResult pushResponse;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why pull this down from successResponse in parent? readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each push response has a different success, counting the PC to increment (2 for PUSH1, 33 for PUSH32). The parent successResponse increments the PC by 1.

@jflo
Copy link
Contributor

jflo commented Oct 20, 2021

Performance spreadsheet

48% improvement over all test cases, 266% improvement in mstore. Slowest operation is now SHA3/KECCAK

sick. how do we get that spreadsheet generated as part of CI? I'd love to be alerted to performance regressions.

shemnon and others added 6 commits October 19, 2021 23:05
Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
Bytes.wrap() presumes an immutable array, and only does a shallow copy
on a perfect "slice."  MutableBytes assumes the bytes could change and
always does a deep copy when a copy is requested.

Also, updated API to reflect mutable bytes type usage.

Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

93.0% 93.0% Coverage
3.4% 3.4% Duplication

Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 👍

@shemnon shemnon merged commit ec5623f into hyperledger:main Oct 20, 2021
garyschulte pushed a commit to garyschulte/besu that referenced this pull request Oct 22, 2021
More EVM Speed Improvements aimed at improving MSTORE times.
* Move Memory from raw bytes to byte array
* Add calls for right packed partial value copy
* Remove `incrementProgramCounter` and move data into OperationResult

Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
@shemnon shemnon deleted the evmSpeedupMLoad branch February 26, 2022 18:43
eum602 pushed a commit to lacchain/besu that referenced this pull request Nov 3, 2023
More EVM Speed Improvements aimed at improving MSTORE times.
* Move Memory from raw bytes to byte array
* Add calls for right packed partial value copy
* Remove `incrementProgramCounter` and move data into OperationResult

Signed-off-by: Danno Ferrin <danno.ferrin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants