eof: Introduce EOF container format support in `Assembly::assemble` (EIP-3540) #15367

rodiazet · 2024-08-28T15:22:46Z

No description provided.

github-actions · 2024-08-28T15:23:00Z

Thank you for your contribution to the Solidity compiler! A team member will follow up shortly.

If you haven't read our contributing guidelines and our review checklist before, please do it now, this makes the reviewing process and accepting your contribution smoother.

If you have any questions or need our help, feel free to post them in the PR or talk to us directly on the #solidity-dev channel on Matrix.

cameel

It's not a full review yet - I still need to finish going through assemble() to make sure it preserves its old non-EOF behavior - but I'm done for today so here's what I have so far.

Overall nothing serious - in one place the non-EOF calculations seem to have changed, which I'm not sure if it's intentional or not, but does not seem dangerous either way. A bit wasteful at most.

Other than that, we could use more asserts in a few places. And some smaller tweaks that should be straightforward to apply.

libevmasm/Assembly.cpp

cameel · 2024-08-29T19:53:08Z

libevmasm/Assembly.cpp

+				}
+				case PushProgramSize:
+				{
+					assertThrow(!eof, AssemblyException, "Push program size in EOF code");


I think the style you used for these messages above was much clearer. Here it sounds like it's telling me to push program size into the code which makes no sense :)

Suggested change

assertThrow(!eof, AssemblyException, "Push program size in EOF code");

assertThrow(!eof, AssemblyException, "PushProgramSize in EOF code");

Also, why is it commented out for PushData above?

libevmasm/Assembly.h

libyul/YulStack.cpp

libevmasm/Assembly.cpp

cameel · 2024-08-29T22:15:16Z

Ah wait. Just realized that the initial commits here are also in #15356. So some of my comments really should have gone into that other PR.

Dependencies of this kind should be mentioned in the description. We also have a has dependencies label that we can put on such PRs to avoid mistakenly reviewing or merging them out of order.

cameel

I went through it again, this time there are no major issues and the behavior on legacy seems preserved. The main remaining problem is that this has a bit more duplication than it has to and I think it would be best to deal with it early instead of letting it grow (especially the opcode processing, which should be shared).

Overall there are still a few things that are worth adjusting before we merge but TBH none of them affects correctness of the implementation or non-EOF stuff so if you think some of them are better off dealt with separately or left as is I wouldn't be strongly opposed to approving the PR even without them.

libevmasm/Assembly.cpp

libevmasm/Assembly.h

cameel · 2024-09-17T13:58:25Z

libevmasm/Assembly.cpp

+
+	std::vector<size_t> codeSectionSizeOffsets;
+	auto setCodeSectionSize = [&](size_t _section, size_t _size) {
+		toBigEndian(_size, bytesRef(ret.bytecode.data() + codeSectionSizeOffsets.at(_section), 2));


This is missing the _size <= 0xFFFF assert.

Actually, maybe you should just add a setBigEndianUint16() helper next and replace this and setDataSectionSize()? This seems to be a common pattern.

But I see you don't have that assert in setBigEndianUint16() now :)

Because setBigEndian checks this.

Asserts in appendBigEndianUint16 are also not needed because appendBigEndian also calls setBigEndian which check the value to set already. Moreover toBigEndian also verifies that value is unsigned.

Ah, ok. So it would be best to remove them from appendBigEndianUint16(). It confused me because I naturally assumed, they should be in neither or in both so if that's not the case, it's probably wrong :)

cameel · 2024-09-17T14:38:17Z

libevmasm/Assembly.cpp

+			switch (i.type())
+			{
+				case Operation:
+					ret.bytecode.push_back(static_cast<uint8_t>(i.instruction()));
+					break;


I think this is a bit that's worth extracting into a function shared between EOF and legacy, with ifs for the bits that differ between the two. What's here now already seems identical, and I think that in the end there will be more common code here than not.

The bit that loops over m_namedTags also looks like it would work for both and could be a helper.

As a side-note, I know this was just carried over from our old code, so not requesting you to change it, but just wanted to say that names like byr, b or i really suck when you have to read the code.

Right. I wanted to make as few as possible changes in legacy code, so decided to postpone functions encapsulation in a separated PR. If you want me to do it in this PR I can do it. LMK.
Will update the naming.

If you want me to do it in this PR I can do it. LMK.

Sure, this can be done in a follow-up PR.

libevmasm/Assembly.cpp

libsolidity/interface/CompilerStack.cpp

libevmasm/Assembly.cpp

cameel · 2024-09-20T13:23:27Z

test/cmdlineTests/yul_eof_container/output

+======= yul_eof_container/input.yul (EVM) =======
+
+Binary representation:
+ef00010100040200010003030004001a001b003b001b040000000080ffff5f80fdef00010100040200010007040000000080ffff5f805260205ffdef00010100040200010008040000000080ffff60015f5260205ffdef00010100040200010008030001001b040000000080ffff60025f5260205ffdef00010100040200010008040000000080ffff60205f5260205ffdef00010100040200010008040000000080ffff60035f5260205ffd


I wish we had a better way to output this. This is quite tedious to verify so I'm just going to assume it's doing what it's supposed to :)

You could use --opcodes instead of --bin, but what that will print will only be correct for the code sections, not for the header itself. And we're planning to deprecate it anyway: #14033.

The plan is to use default outputs for --strict-assembly input but assembly output is not supported for EOF yet. When it will be I will remove --bin and we will have Text representation: output also.

For --opcodes now unfortunately I get this error Error: The following outputs are not supported in assembler mode: --opcodes.

Text representation sounds good, as long as this is a format we're going to commit to and is suitable to show to users.

If not, we can still use it internally though - we can create a new test case type based on YulOptimizerTest and use it there just for testing.

test/cmdlineTests/yul_eof_container/args

cameel

I finished going through the PR, so to summarize, #15367 (comment) is basically the only real issue remaining.

I added a few more comments for minor things but those are just cosmetics and would not block the approval.

cameel · 2024-09-20T13:31:48Z

Ah wait, I didn't mean to approve yet. But well, given that you're fixing this as we speak, it would be approved in a moment anyway I guess.

…EIP-3540) Co-authored-by: Daniel Kirchner <daniel@ekpyron.org>

github-actions bot added the external contribution ⭐ label Aug 28, 2024

rodiazet force-pushed the eof-prep-3 branch 5 times, most recently from 89784b7 to 5b61995 Compare August 29, 2024 11:19

cameel reviewed Aug 29, 2024

View reviewed changes

cameel added has dependencies The PR depends on other PRs that must be merged first EOF labels Aug 29, 2024

rodiazet force-pushed the eof-prep-3 branch 5 times, most recently from 859e524 to c82e78b Compare September 4, 2024 14:29

cameel reviewed Sep 17, 2024

View reviewed changes

rodiazet force-pushed the eof-prep-3 branch 8 times, most recently from 6341b7b to 57d4c82 Compare September 20, 2024 11:09

cameel removed the has dependencies The PR depends on other PRs that must be merged first label Sep 20, 2024