-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Optimize runtime for non single-module(01,02) compilation #14225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There are multiple spec failures, so it looks like this has altered the semantics of some programs |
This is really good, awesome work 👍 I think we'll need to look at the individual actions independently, though. There is already an pending discussion about likely/unlikely intrinsices: #11910. We should continue this part there. I think there's a general consensus on supporting it in some way, but the concrete implementation is still up for debate. I don't think there has been any dedicated discussion about inlining methods on the Crystal side, except what you already mentioned in #13505. So this would be a new discussion to start. The idea sounds definitely interesting. But there is a wide range of possibilities how it can be implemented. Then I also see a number of other changes. Not sure what's the reason for adding parenthesis to |
parenthesis remains because I try likely here, but somehow it compile error, so I remove it. Optimizations which add huge impact:
|
Awesome work! @kostya although I don’t understand these codes, but look at the final result, it great! |
In fact, I hope the core members can raise the priority for this meaningful discussion. Don't expect like #13464,it take eight months from proposal to merge, too long, it destroying the enthusiasm of OS contributors, it also destroying the enthusiasm of people using Crystal, In fact, I feel that if it weren’t the hard work of @kostya and @funny-falcon 's follow-up, #13464 , It probably never been merged until now. Without the help of open source contributors, it would be difficult for Crystal to make great progress. just like the release of 1.11.0, many issue were discovered and fixed very quickly in 1.11.1 👍 |
Shouldn’t each_with_index specialized then for common collections? At least Array already have index when it iterates, no need for other var. I see problem in untyped |
instead of AlwaysInline, we can for example calculate somehow method complexity, and inline if it small, like |
Another idea, mark full class with |
Also there is already small inlining in generate calls https://github.com/crystal-lang/crystal/blob/master/src/compiler/crystal/codegen/call.cr#L449, but it inline only primitives, self and getters. It also should inline setters for example. |
I experiencing very slow compile std_spec in this branch in -O2, I think this is because all specs compiled in single module (because no classes), so parallel compilation have no benefit here. |
@kostya indeed, the |
|
for simplify testing I add this rep: https://github.com/kostya/crystal-metric --release: 25.6282s
In this branch runtime for |
My first branch was separated by commits. This is result for crystal-metric on each commit. For
|
Very nice! So, these 4 commits had the most impact: kostya@43514d7, 70.3383s Looking at them, it's mostly inlining methods, some (un)likely and a few removed overflow checks in Enumerable. I believe what @funny-falcon and I suggested above stands? |
I see one more thing to improve, which I don't know how to do, this is record Bla, x : Int
Bla.new 1
|
Plotting this changes for crystal-metric rep. |
All this changes questionable and I don't know even if it pass specs. Also optimization results was for my set of benchmarks, and I don't know if it would work for something else.
Changes:
AlwaysInline
, now AlwaysInline behave like it sounds, purely inline code on crystal level. Before it just marks it in llvm (so inline works only in current llvm module, no cross module inlining). Not sure if backtrace btw works. So now all methods marked by AlwaysInline, like headers in C++. Also mark by this AlwaysInline many small but critical methods, which have most slowdown in per module compilation. This marks was only done for methods, which I only found was to be slow, in my tests. But I am sure there also many methods which I missed.likely
intrinsic which should help in non single-module optimizations.Results:
In this PR
-O2
became is a decent level of compilation, which runtime is 0.9-2 times slower than release (in 1 test even faster), but compile time is 3-4 times faster, incremental compile time: 5-10 times faster.--release
just little faster after all this optimizations, because single-module and O3 doing similar thing with inlining.On the recompile compiler this optimizations have no visible effect.
Example of optimization from some benchmarks:
brainfuck
Master:
-O2: initial: 2.40s, incremental: 1.33s, run time: 19.71s
This branch:
-O2: initial: 2.25s, incremental: 1.32s, run time: 4.90s
brainfuck runtime 4 times faster in this branch. and almost equal release result: 4.26s (which compile time is 6.74s)