-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Introduce Yul builtin handles for its dialects #15520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2fdef10
to
7a05715
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting this. This was much easier to get through in one sitting.
I didn't find any significant problems. Just some readability and assert tweaks. Once those are addressed, we can merge.
But please also check the external.sh
benchmark numbers between this and 0.8.28 to see if there's any impact on performance. Would also be interesting to see if removing the helper map would have any effect on those numbers.
7a05715
to
5f3a6f1
Compare
e13af0e
to
ae538de
Compare
ae538de
to
80e2111
Compare
Removing the helper map made things consistently worse so I am not even showing it here. Here's the comparison of the PR branch to develop: I let both benchmark scripts run twice, the percentages are with respect to the best runtimes per benchmark. So the branch seems to improve things consistently. I am a bit surprised, really expected to see a (then hopefully non-drastic) worsening of timings. But I'll take that. Then I actually expect another improvement of runtime once the builtins have their own AST element. Based on the reliability of my gut feeling, that should make things worse again :P I also made a yolo version of the BuiltinFunctionForEvm const& builtin(BuiltinHandle const& _handle) const override
{
return isVerbatimHandle(_handle) ? *m_verbatimFunctions[handle.id] : *m_functions[handle.id - verbatimIDOffset];
} Turns out that living dangerous doesn't gain you much in this case and the results are comparable to the version with asserts inside of the compilation unit. Generally I find that the benchmarks should be run a couple times, probably, and I also think that the per-second output may not be precise enough. Especially for the faster stuff like compiling uniswap, a one-second difference -which may very well be due to rounding errors and in reality much smaller- makes up for a (perhaps spurious) 1.5% difference in runtime. |
Interesting. So the difference is actually big enough to show up in benchmarks? How much was it?
Just a note that to get results faster, I'd recommend commenting out legacy runs. Also sablier and seaport since those stop at an arbitrary point due to StackToDeep and may be unreliable for comparisons. All of those are enabled by default, but they're not all equally useful :)
I mean, asserts may look heavy, but if they don't fail, they boil down to one extra
Definitely. We didn't want to overengineer them, they're just simple scripts, but I would not mind if they were extended to allow running multiple times with results averaged over those runs. I sometimes do that by hand, but it's really tedious.
I'm not sure these numbers are accurate enough for sub-second values to matter. Showing them may be misleading. On my system I sometimes see variance between runs that can be counted in seconds. Even for OZ, which takes only ~30 s. For example when I was rerunning benchmarks to get numbers for the release I was quite surprised that the first run did not reproduce the gains I saw in the PR. Only after multiple runs it averaged to something closer. I think that multiple runs with averaged values would be a better way to address this. From my experience with these benchmarks, reliably measuring a difference on the order of 1-2% is hard because there are other factors impacting them that are potentially greater than that. It depends though - some days I'm getting pretty consistent results between runs, while other days it varies widely. If the difference is this small, it may well not even be there. It's still better than what we were getting with benchmarking external tests in CI. There the variance was in tens of percents, so I consider what we're getting here pretty good regardless :) |
80e2111
to
6a2631e
Compare
For openzeppelin, it took consistently
Nice, good then! :)
Yes the variance is quite large. Still, especially for averaging runs it would be good to have more precise measurements. Then you might even see faster convergence to a meaningful mean - especially on faster hardware. I haven't done it yet, but it would be interesting to see the distribution and how many samples you'd need for (eg) OZ to get a good grasp. Then it should also be possible to better compare versions.
Cosmic radiation :D Still, provided enough samples, you should still be able to see a difference in mean at least :) |
I'd rather solve that by making the script do the work for us and average them internally. The current precision is already due to me wanting to automate things more - I used to have to round them by hand since I do a lot of these benchmarks. And when I actually need to post the results, it's when the differences are measurable and extra decimal places are just noise. We could also add an option for precision (even with making bigger precision the default). So far it just didn't seem overly necessary. |
fixedFunctionNames
fromyul::Dialect
yul::BuiltinHandle
struct that functions as a reference to a builtin functionyul::EVMDialect
handling of builtins and verbatims so that fetching the function from a handle isO(1)
; the lookup of a handle from a builtin name (string identifier) is managed through lookup in anunordered_map
builtin
converts a handle to the corresponding builtin function; a new functionfindBuiltin
tries to determine a handle corresponding to a builtin's string identifier