Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add !noalias and !alias.scope metadata #48256

Merged
merged 2 commits into from
Jan 13, 2023
Merged

Conversation

topolarity
Copy link
Member

@topolarity topolarity commented Jan 12, 2023

The motivation here is that the TBAA domain is ill-equipped for reasoning about regions. In particular, it suffers total precision loss when merging disparate types in a memcpy causing #47138.

As suggested by @vtjnash, !noalias should probably be used for region-based memory information and !tbaa should be used exclusively for layout. This change adds !noalias annotations without modifying the TBAA hierarchy.

All of the "interesting" changes are in 7313b7f (which resolves #47138 on its own). b08c644 just rolls out the same trick over (nearly) all other loads/stores.

We'll want to follow up soon to prune the TBAA hierarchy, which I think will be a strict improvement.

The main idea here is that the TBAA domain is ill-equipped for
reasoning about regions (and, in particular, suffers total
precision less when merging disparate types in a `memcpy`).
Instead, `!noalias` should be used for region-based memory
information and `!tbaa` should be used exclusively for layout.

We use (5) regions corresponding to the top level of the TBAA tree:
  - gcframe
  - stack
  - data
  - constant
  - type_metadata

For now, this leaves the TBAA hierarchy in tact and only adds
additional `!noalias` metadata. `!tbaa` annotations should be
the same as before.
This is an interim solution that derives the correct `!noalias`
region from the existing TBAA information.

Later we will want to:
    - Revise the TBAA hierarchy to remove region information
    - Delete `jl_aliasinfo_t::fromTBAA()`
    - Update `jl_cgval_t` to store a `jl_aliasinfo_t`
@gbaraldi
Copy link
Member

I gave it a once over and it LGTM, probably deserves nanosoldier/pkgeval runs though.

@vchuravy vchuravy added compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM needs pkgeval Tests for all registered packages should be run with this change labels Jan 12, 2023
@gbaraldi
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@gbaraldi
Copy link
Member

@nanosoldier runtests()

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@N5N3
Copy link
Member

N5N3 commented Jan 13, 2023

I can confirm that the example in #47138 get fixed by this PR.
But looks like the regression still holds for the following simple case

julia> f(a, b) = for i in eachindex(a, b)
       a[i] = b[i]
       end
f (generic function with 2 methods)

julia> code_llvm(f, Tuple{Vector{ComplexF64},Vector{ComplexF64}})

@gbaraldi
Copy link
Member

gbaraldi commented Jan 13, 2023

At least locally, I get 66ns on master, 33ns on this PR and 23ns on 1.8. So it seems we've gotten a bit better here, but there's still something left.

That something being lowering that memmove doesn't use SIMD.

@N5N3
Copy link
Member

N5N3 commented Jan 13, 2023

IIRC (The build has been deleted), the memmove happens in a loop block.
And 1.8 also generates memmove for non-simd cases, e.g.
code_llvm(f, Tuple{Vector{Tuple{Int,Int32}},Vector{Tuple{Int,Int32}}})

@gbaraldi
Copy link
Member

gbaraldi commented Jan 13, 2023

It generates a memmove because it can't prove the two arrays passed to the function don't alias each other.
The new framework as is assumes that all mutable julia objects may alias eachother if I understood it correctly.

@N5N3
Copy link
Member

N5N3 commented Jan 13, 2023

But I thought LLVM's vectorlizer is able to add a memcheck block and try to use simd if valid?

@gbaraldi
Copy link
Member

I'm not sure if llvm is taking the noaliasthing and using it as the full truth, or if the vectorizer isn't running here because you don't get separate loads and stores.

@N5N3
Copy link
Member

N5N3 commented Jan 13, 2023

I just check the Clang result. It vectorlized well. (Well, the input for LV pass does have seperate load and store...)
Perhaps we need some pass tuning?

@gbaraldi
Copy link
Member

gbaraldi commented Jan 13, 2023

So memcpyopt is transforming our memcpys into a memmove. Also your clang example isn't the same as your julia one.
If you make a similar struct in julia, with this PR. You get the vectorization as expected. If you make it so the struct is bigger you get memcpys in the loop.

@N5N3
Copy link
Member

N5N3 commented Jan 13, 2023

Also your clang example isn't the same as your julia one.

Oh my fault, didn't notice that. So at least ComplexF32 would be as fast as 1.8 after this PR.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected.
A full report can be found here.

@vtjnash
Copy link
Member

vtjnash commented Jan 13, 2023

@nanosoldier runbenchmarks("blas", vs=":master")

@vtjnash vtjnash added merge me PR is reviewed. Merge when all tests are passing and removed needs pkgeval Tests for all registered packages should be run with this change labels Jan 13, 2023
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

@vtjnash vtjnash merged commit faa37a6 into JuliaLang:master Jan 13, 2023
@topolarity topolarity deleted the noalias branch January 14, 2023 00:10
@DilumAluthge DilumAluthge removed the merge me PR is reviewed. Merge when all tests are passing label Jan 14, 2023
@vtjnash vtjnash added the backport 1.9 Change should be backported to release-1.9 label Jan 25, 2023
@KristofferC KristofferC mentioned this pull request Feb 1, 2023
35 tasks
@KristofferC KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simd regression with AoS dest on master.
8 participants