Skip to content

Add tests for broadcast vectorization #49317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 12, 2023
Merged

Add tests for broadcast vectorization #49317

merged 3 commits into from
Apr 12, 2023

Conversation

pchintalapudi
Copy link
Member

This tests that basic broadcasting operations (outer/inner products) do end up vectorizing after the LLVM pipeline.

@gbaraldi This will probably block LLVM 15 due to https://discourse.llvm.org/t/newpm-loop-unswitch-regression-from-legacy-pm-to-new-pm/69823 and https://reviews.llvm.org/D124376

@pchintalapudi pchintalapudi added compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM labels Apr 10, 2023
@pchintalapudi pchintalapudi requested a review from vchuravy April 10, 2023 20:14
@gbaraldi
Copy link
Member

What optimization is the new LoopUnswitch lacking? Given that it doesn't optimize on LLVM 16 as well, we might be stuck for a very long while. Can we help it from somewhere else?

@pchintalapudi
Copy link
Member Author

I'd suspect that we're generating select instructions for the broadcasting operations, and loop unswitch doesn't have support for unswitching select instructions because it won't reduce code size measurably. Most likely we'll need to patch in optional support for unswitch-on-select upstream.

@gbaraldi
Copy link
Member

And the old one did support doing that? Or is it caused by something else in the pipeline?

@pchintalapudi
Copy link
Member Author

I at least see select instruction handling in line 824 of old LoopUnswitch and a NumSelects statistic, whereas SimpleLoopUnswitch.cpp only has NumBranches and NumSwitches.

@vchuravy
Copy link
Member

So my plan for now is to import the LoopUnswitch pass either into our own source tree or by reverting the relevant commits.

The former is probably nicer to folks who want to build against custom LLVMs.

@pchintalapudi pchintalapudi added the merge me PR is reviewed. Merge when all tests are passing label Apr 11, 2023
@giordano
Copy link
Contributor

giordano commented Apr 11, 2023

Quite unexpectedly, this new test fails on A64FX:

FAIL: Julia :: pipeline-o2-broadcast.jl (31 of 35)
******************** TEST 'Julia :: pipeline-o2-broadcast.jl' FAILED ********************
Script:
--
: 'RUN: at line 1';   julia --startup-file=no -O2 --check-bounds=auto /snx11273/home/ri-mgiordano/repo/julia/test/llvmpasses/pipeline-o2-broadcast.jl /snx11273/home/ri-mgiordano/repo/julia/test/llvmpasses/Outpu
t/pipeline-o2-broadcast.jl.tmp -O && llvm-link -S /snx11273/home/ri-mgiordano/repo/julia/test/llvmpasses/Output/pipeline-o2-broadcast.jl.tmp/* | FileCheck /snx11273/home/ri-mgiordano/repo/julia/test/llvmpasses/
pipeline-o2-broadcast.jl
: 'RUN: at line 2';   julia --startup-file=no -O3 --check-bounds=auto /snx11273/home/ri-mgiordano/repo/julia/test/llvmpasses/pipeline-o2-broadcast.jl /snx11273/home/ri-mgiordano/repo/julia/test/llvmpasses/Outpu
t/pipeline-o2-broadcast.jl.tmp -O && llvm-link -S /snx11273/home/ri-mgiordano/repo/julia/test/llvmpasses/Output/pipeline-o2-broadcast.jl.tmp/* | FileCheck /snx11273/home/ri-mgiordano/repo/julia/test/llvmpasses/
pipeline-o2-broadcast.jl
--
Exit Code: 1

Command Output (stderr):
--
/snx11273/home/ri-mgiordano/repo/julia/test/llvmpasses/pipeline-o2-broadcast.jl:10:10: error: CHECK: expected string not found in input
# CHECK: load <[[VEC_FACTOR:[0-9]+]] x float>
         ^
<stdin>:6:50: note: scanning from here
define nonnull {} addrspace(10)* @japi1_prod_v_vT_69({} addrspace(10)* %0, {} addrspace(10)** noalias nocapture noundef readonly %1, i32 %2) #0 !dbg !15 {
                                                 ^
<stdin>:9:55: note: possible intended match here
 %gcframe169.sub = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*]* %gcframe169, i64 0, i64 0
                                                      ^

Input file: <stdin>
Check file: /snx11273/home/ri-mgiordano/repo/julia/test/llvmpasses/pipeline-o2-broadcast.jl

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = 'llvm-link'
            2: source_filename = "llvm-link"
            3: target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
            4: target triple = "aarch64-unknown-linux-gnu"
            5:
            6: define nonnull {} addrspace(10)* @japi1_prod_v_vT_69({} addrspace(10)* %0, {} addrspace(10)** noalias nocapture noundef readonly %1, i32 %2) #0 !dbg !15 {
check:10'0                                                      X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            7: top:
check:10'0     ~~~~~
            8:  %gcframe169 = alloca [3 x {} addrspace(10)*], align 16
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            9:  %gcframe169.sub = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*]* %gcframe169, i64 0, i64 0
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:10'1                                                           ?                                                                possible intended match
           10:  %3 = bitcast [3 x {} addrspace(10)*]* %gcframe169 to i8*
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           11:  call void @llvm.memset.p0i8.i32(i8* noundef nonnull align 16 dereferenceable(24) %3, i8 0, i32 24, i1 false), !tbaa !19
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           12:  %4 = alloca {} addrspace(10)**, align 8
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           13:  store volatile {} addrspace(10)** %1, {} addrspace(10)*** %4, align 8
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           14:  %thread_ptr = call i8* asm "mrs $0, tpidr_el0", "=r"() #5
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            .
            .
            .
>>>>>>

--

Edit: for the record, also loopinfo test fails: #49325

@pchintalapudi
Copy link
Member Author

Does A64FX have scalable vectors? Iirc the syntax for that changes to n x vscale x type.

@pchintalapudi pchintalapudi removed the merge me PR is reviewed. Merge when all tests are passing label Apr 12, 2023
@pchintalapudi
Copy link
Member Author

@giordano could you try the latest on A64FX to see if it fixes the issue? If not, could you send the output of @code_llvm prod_v_vT(zeros(0,0), zeros(0), zeros(0))?

@giordano
Copy link
Contributor

Yes, that worked, thanks! Maybe there's something similar to be done for #49325?

@pchintalapudi pchintalapudi merged commit dcda267 into master Apr 12, 2023
@pchintalapudi pchintalapudi deleted the pc/test-broadcast branch April 12, 2023 16:14
Xnartharax pushed a commit to Xnartharax/julia that referenced this pull request Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants