-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: suboptimal arm64 output #43145
Comments
Yeah, this is the general problem where currently the compiler doesn't reorder loads and stores. For example, in this case, one load of |
Do we have plans to enable alias analysis? Thank you. |
No current plans. Alias analysis tends to be expensive in compile time. We'd want something that is quick but accurate enough to be useful. I don't think anyone has an idea how to do that yet. |
Perhaps a simple implementation with not so wide coverage should not be very expensive (I haven't done any experiments, just by feeling), such as the above case, we can easily analyze that there is no dependency between the |
There are the obvious/trivial ones: SP offsets with non-overlapping extents do not alias, SP does not alias SB. These matter less in a reg ABI but are very cheap and potentially offer some wins. |
Maybe also the GCC SRA pass could be used as inspiration for work in this area: https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=jambor.pdf |
Even in trivial cases Also see: #19715 |
What version of Go are you using (
go version
)?What did you do?
Compiled this function.
Full codebase at FiloSottile/edwards25519#8
What did you expect to see?
What did you see instead?
The compiler figures out the same AND, ADD, and LSR+MADD that my hand-written assembly uses, but note how it loads the inputs twice from memory and looks like it doesn't know about STP and LDP.
Not sure which part makes the most effect, but I got a 10% speedup on some high-level functions (although not on microbenchmarks of thinner functions) between my assembly and the compiler with
go:noinline
. (Interestingly, if I let the compiler inline the high level functions get even slower, while the thin ones get faster.)The text was updated successfully, but these errors were encountered: