Skip to content

Introduce Yul builtin handles for its dialects #15520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 21, 2024
Merged

Conversation

clonker
Copy link
Member

@clonker clonker commented Oct 17, 2024

  • removes fixedFunctionNames from yul::Dialect
  • introduces yul::BuiltinHandle struct that functions as a reference to a builtin function
  • modifies yul::EVMDialect handling of builtins and verbatims so that fetching the function from a handle is O(1); the lookup of a handle from a builtin name (string identifier) is managed through lookup in an unordered_map
  • the dialect's function builtin converts a handle to the corresponding builtin function; a new function findBuiltin tries to determine a handle corresponding to a builtin's string identifier

@clonker clonker changed the title Builtin handles Introduce Yul builtin handles for its dialects Oct 17, 2024
@clonker clonker requested a review from cameel October 17, 2024 12:31
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this. This was much easier to get through in one sitting.

I didn't find any significant problems. Just some readability and assert tweaks. Once those are addressed, we can merge.

But please also check the external.sh benchmark numbers between this and 0.8.28 to see if there's any impact on performance. Would also be interesting to see if removing the helper map would have any effect on those numbers.

@clonker clonker force-pushed the builtin_handles branch 2 times, most recently from e13af0e to ae538de Compare October 18, 2024 08:13
@clonker
Copy link
Member Author

clonker commented Oct 18, 2024

But please also check the external.sh benchmark numbers between this and 0.8.28 to see if there's any impact on performance. Would also be interesting to see if removing the helper map would have any effect on those numbers.

Removing the helper map made things consistently worse so I am not even showing it here. Here's the comparison of the PR branch to develop:

hmm

I let both benchmark scripts run twice, the percentages are with respect to the best runtimes per benchmark. So the branch seems to improve things consistently. I am a bit surprised, really expected to see a (then hopefully non-drastic) worsening of timings. But I'll take that. Then I actually expect another improvement of runtime once the builtins have their own AST element. Based on the reliability of my gut feeling, that should make things worse again :P

I also made a yolo version of the builtin(handle) implementation by putting it into the header and writing it as follows:

BuiltinFunctionForEvm const& builtin(BuiltinHandle const& _handle) const override 
{
    return isVerbatimHandle(_handle) ? *m_verbatimFunctions[handle.id] : *m_functions[handle.id - verbatimIDOffset];
}

Turns out that living dangerous doesn't gain you much in this case and the results are comparable to the version with asserts inside of the compilation unit.

Generally I find that the benchmarks should be run a couple times, probably, and I also think that the per-second output may not be precise enough. Especially for the faster stuff like compiling uniswap, a one-second difference -which may very well be due to rounding errors and in reality much smaller- makes up for a (perhaps spurious) 1.5% difference in runtime.

cameel
cameel previously approved these changes Oct 18, 2024
@cameel
Copy link
Member

cameel commented Oct 18, 2024

Removing the helper map made things consistently worse so I am not even showing it here.

Interesting. So the difference is actually big enough to show up in benchmarks? How much was it?

I let both benchmark scripts run twice,

Just a note that to get results faster, I'd recommend commenting out legacy runs. Also sablier and seaport since those stop at an arbitrary point due to StackToDeep and may be unreliable for comparisons.

All of those are enabled by default, but they're not all equally useful :)

Turns out that living dangerous doesn't gain you much in this case and the results are comparable to the version with asserts inside of the compilation unit.

I mean, asserts may look heavy, but if they don't fail, they boil down to one extra if. Unless the condition itself was heavy, that's exactly what I'd have expected here :)

Generally I find that the benchmarks should be run a couple times

Definitely. We didn't want to overengineer them, they're just simple scripts, but I would not mind if they were extended to allow running multiple times with results averaged over those runs. I sometimes do that by hand, but it's really tedious.

I also think that the per-second output may not be precise enough. Especially for the faster stuff like compiling uniswap, a one-second difference -which may very well be due to rounding errors and in reality much smaller- makes up for a (perhaps spurious) 1.5% difference in runtime.

I'm not sure these numbers are accurate enough for sub-second values to matter. Showing them may be misleading. On my system I sometimes see variance between runs that can be counted in seconds. Even for OZ, which takes only ~30 s. For example when I was rerunning benchmarks to get numbers for the release I was quite surprised that the first run did not reproduce the gains I saw in the PR. Only after multiple runs it averaged to something closer. I think that multiple runs with averaged values would be a better way to address this.

From my experience with these benchmarks, reliably measuring a difference on the order of 1-2% is hard because there are other factors impacting them that are potentially greater than that. It depends though - some days I'm getting pretty consistent results between runs, while other days it varies widely. If the difference is this small, it may well not even be there. It's still better than what we were getting with benchmarking external tests in CI. There the variance was in tens of percents, so I consider what we're getting here pretty good regardless :)

@clonker
Copy link
Member Author

clonker commented Oct 21, 2024

Interesting. So the difference is actually big enough to show up in benchmarks? How much was it?

For openzeppelin, it took consistently 17s vs 16s over three runs (although binning in seconds doesn't make this particularly meaningful); for uniswap it was (66s,65s) instead of (62s, 64s); for eigenlayer (294s,295s) instead of (292s, 283s). To make it more quantitative I should run the benchmark more often course. Perhaps this is still just chance :)

I mean, asserts may look heavy, but if they don't fail, they boil down to one extra if. Unless the condition itself was heavy, that's exactly what I'd have expected here :)

Nice, good then! :)

Definitely. We didn't want to overengineer them, they're just simple scripts, but I would not mind if they were extended to allow running multiple times with results averaged over those runs. I sometimes do that by hand, but it's really tedious.

I'm not sure these numbers are accurate enough for sub-second values to matter. Showing them may be misleading.

Yes the variance is quite large. Still, especially for averaging runs it would be good to have more precise measurements. Then you might even see faster convergence to a meaningful mean - especially on faster hardware. I haven't done it yet, but it would be interesting to see the distribution and how many samples you'd need for (eg) OZ to get a good grasp. Then it should also be possible to better compare versions.

From my experience with these benchmarks, reliably measuring a difference on the order of 1-2% is hard because there are other factors impacting them that are potentially greater than that. It depends though - some days I'm getting pretty consistent results between runs, while other days it varies widely.

Cosmic radiation :D Still, provided enough samples, you should still be able to see a difference in mean at least :)

@clonker clonker merged commit abc46f3 into develop Oct 21, 2024
73 checks passed
@clonker clonker deleted the builtin_handles branch October 21, 2024 14:04
@cameel
Copy link
Member

cameel commented Oct 21, 2024

Yes the variance is quite large. Still, especially for averaging runs it would be good to have more precise measurements.

I'd rather solve that by making the script do the work for us and average them internally. The current precision is already due to me wanting to automate things more - I used to have to round them by hand since I do a lot of these benchmarks. And when I actually need to post the results, it's when the differences are measurable and extra decimal places are just noise.

We could also add an option for precision (even with making bigger precision the default). So far it just didn't seem overly necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants