Skip to content

Conversation

@KristofferC
Copy link
Member

@KristofferC KristofferC commented Oct 20, 2025

Fixes #59906 from my testing, takes the runtime of the code in there from 4000 ns to 1000 ns on a M4 macbook.

Developed together with Claude 🤖

My understanding of the code here is rudimentary but I still put up this PR in case it is helpful.

@KristofferC KristofferC requested a review from vtjnash October 20, 2025 10:46
@KristofferC KristofferC added performance Must go faster compiler:codegen Generation of LLVM IR and native code backport 1.12 Change should be backported to release-1.12 labels Oct 20, 2025
@vtjnash
Copy link
Member

vtjnash commented Oct 20, 2025

It is correctly implemented, but has quite a few known catastrophic performance failures, so we should not do this.

@KristofferC
Copy link
Member Author

It is correctly implemented, but has quite a few known catastrophic performance failures, so we should not do this.

Do you have an example (would nanosoldier show it)? Any other way to get back the performance lost in 25cbe00?

@vtjnash
Copy link
Member

vtjnash commented Oct 20, 2025

Yeah, looks like an LLVM bug (this particularly one has been a super common one, and was supposed to be fixed by opaque pointers). But we can add the hack back that usually does okay to often work around that

@vtjnash
Copy link
Member

vtjnash commented Oct 20, 2025

We just need to audit all calls to emit_static_alloca and make sure they use the old (pre-opaque pointer) GEP type instead of actually benefitting from LLVM's enormous amount of opaque pointer work. I suspect the SROA pass is still at fault here for the performance issues.

@oscardssmith
Copy link
Member

wait, we want to use the old version?

@gbaraldi
Copy link
Member

llvm/llvm-project#164308 I did some snooping around and opened that. I couldn't minimize it much further but the difference is LLVM thinks that an alloca of floats is meaningfully different than an alloca of ints. If anyone wants to take a stab the issue is probably in https://github.com/llvm/llvm-project/blob/e6b4a21849f0588b1c4fb39802a3999d7ac51dad/llvm/lib/Transforms/Scalar/SROA.cpp#L4885-L4966

@KristofferC KristofferC mentioned this pull request Oct 21, 2025
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 1.12 Change should be backported to release-1.12 compiler:codegen Generation of LLVM IR and native code performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

@benchmark reports x5 slower with Float32 Tuple with julia 1.12

5 participants