Skip to content

Merge latest upstream changes into master #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from

Conversation

rouson
Copy link

@rouson rouson commented Nov 28, 2017

No description provided.

Alessandro Fanfarillo and others added 26 commits August 2, 2017 13:26
update teams branch from upstream
Merge upstream gcc-mirror/gcc into sourceryinstitute/gcc
Merge sourceryinstitute/master into sourceryinstitute/teams
Merge master into download-opencoarrays-mpich
This reverts commit 71c41bf, reversing
changes made to 7204ca4.
Revert "Merge pull request #9 from sourceryinstitute/download-opencoarrays-mpich"

This reverts commit 458897c, reversing
changes made to 71c41bf.
@rouson rouson closed this Nov 28, 2017
habemus-papadum pushed a commit to habemus-papadum/gcc that referenced this pull request Nov 29, 2017
This patch improves alloca alignment.  Currently alloca reserves
too much space as it aligns twice, and generates unnecessary stack
alignment code.

When the requested alignment is lower than the stack alignment, no
extra alignment is needed.  If the requested alignment is higher,
we need to increase the size by the difference of the requested 
alignment and the stack alignment.  As a result, the alloca alignment
is exactly as expected:

alloca (16):
	sub	sp, sp, gcc-mirror#16
	mov	x1, sp

alloca (x):
	add	x0, x0, 15
	and	x0, x0, -16
	sub	sp, sp, x0
	mov	x0, sp

__builtin_alloca_with_align (x, 512):
	add	x0, x0, 63
	and	x0, x0, -16
	sub	sp, sp, x0
	add	x0, sp, 63
	and	x0, x0, -64

    gcc/
	* explow.c (get_dynamic_stack_size): Improve dynamic alignment.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251713 138bc75d-0d04-0410-961f-82ee72b054a4
hubot pushed a commit that referenced this pull request Feb 8, 2018
The shrinkwrap optimization added in GCC 7 allows each callee-save to
be delayed and done only across blocks which need a particular callee-save.
Although this reduces unnecessary memory traffic on code paths that need
few callee-saves, it typically uses LDR/STR rather than LDP/STP.  This
means more memory accesses and increased codesize, ~1.0% on average.

To improve this, if a particular callee-save must be saved/restored, also
add the adjacent callee-save to allow use of LDP/STP.  This significantly
reduces codesize (for example gcc_r, povray_r, parest_r, xalancbmk_r are
1% smaller).  This is a simple fix which can be backported.  A more advanced
approach would scan blocks for pairs of callee-saves, but that requires a
full rewrite of all the callee-save code which is too late at this stage.

An example epilog in a shrinkwrapped function before:

ldp    x21, x22, [sp,#16]
ldr    x23, [sp,#32]
ldr    x24, [sp,#40]
ldp    x25, x26, [sp,#48]
ldr    x27, [sp,#64]
ldr    x28, [sp,#72]
ldr    x30, [sp,#80]
ldr    d8, [sp,#88]
ldp    x19, x20, [sp],#96
ret

And after this patch:

ldr    d8, [sp,#88]
ldp    x21, x22, [sp,#16]
ldp    x23, x24, [sp,#32]
ldp    x25, x26, [sp,#48]
ldp    x27, x28, [sp,#64]
ldr    x30, [sp,#80]
ldp    x19, x20, [sp],#96
ret

    gcc/
	* config/aarch64/aarch64.c (aarch64_components_for_bb):
	Increase LDP/STP opportunities by adding adjacent callee-saves.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@257482 138bc75d-0d04-0410-961f-82ee72b054a4
kraj pushed a commit to kraj/gcc that referenced this pull request Oct 12, 2020
Prevents the following UBSAN error:

./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/torture/pr49770.C -O2 -c
/home/marxin/Programming/gcc2/gcc/ipa-modref-tree.h:482:22: runtime error: load of value 2, which is not a valid value for type 'bool'
    #0 0x1fdb4d1 in modref_tree<int>::merge(modref_tree<int>*, vec<modref_parm_map, va_heap, vl_ptr>*) /home/marxin/Programming/gcc2/gcc/ipa-modref-tree.h:482
    #1 0x1fcadaa in merge_call_side_effects(modref_summary*, gimple*, modref_summary*, bool) /home/marxin/Programming/gcc2/gcc/ipa-modref.c:511
    gcc-mirror#2 0x1fcbadd in analyze_call /home/marxin/Programming/gcc2/gcc/ipa-modref.c:642
    gcc-mirror#3 0x1fcc061 in analyze_stmt /home/marxin/Programming/gcc2/gcc/ipa-modref.c:732
    gcc-mirror#4 0x1fccf31 in analyze_function /home/marxin/Programming/gcc2/gcc/ipa-modref.c:823
    gcc-mirror#5 0x1fd17e5 in execute /home/marxin/Programming/gcc2/gcc/ipa-modref.c:1441
    gcc-mirror#6 0x25cca6e in execute_one_pass(opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2509
    gcc-mirror#7 0x25cd39b in execute_pass_list_1 /home/marxin/Programming/gcc2/gcc/passes.c:2597
    gcc-mirror#8 0x25cd450 in execute_pass_list_1 /home/marxin/Programming/gcc2/gcc/passes.c:2598
    gcc-mirror#9 0x25cd4ee in execute_pass_list(function*, opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2608
    gcc-mirror#10 0x25c7a5a in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/Programming/gcc2/gcc/passes.c:1726
    gcc-mirror#11 0x25cfa3f in execute_ipa_pass_list(opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2941
    gcc-mirror#12 0x173572d in ipa_passes /home/marxin/Programming/gcc2/gcc/cgraphunit.c:2642
    gcc-mirror#13 0x17364ee in symbol_table::compile() /home/marxin/Programming/gcc2/gcc/cgraphunit.c:2777
    gcc-mirror#14 0x17372d9 in symbol_table::finalize_compilation_unit() /home/marxin/Programming/gcc2/gcc/cgraphunit.c:3022
    gcc-mirror#15 0x2a1f00a in compile_file /home/marxin/Programming/gcc2/gcc/toplev.c:485
    gcc-mirror#16 0x2a27dc8 in do_compile /home/marxin/Programming/gcc2/gcc/toplev.c:2321
    gcc-mirror#17 0x2a283cc in toplev::main(int, char**) /home/marxin/Programming/gcc2/gcc/toplev.c:2460
    gcc-mirror#18 0x54f21cd in main /home/marxin/Programming/gcc2/gcc/main.c:39
    gcc-mirror#19 0x7ffff6f0de09 in __libc_start_main ../csu/libc-start.c:314
    gcc-mirror#20 0x9eac09 in _start (/home/marxin/Programming/gcc2/objdir/gcc/cc1plus+0x9eac09)

gcc/ChangeLog:

	* ipa-modref.c (merge_call_side_effects): Clear modref_parm_map
	fields in the vector.
kraj pushed a commit to kraj/gcc that referenced this pull request Nov 2, 2020
Enable thumb1_gen_const_int to generate RTL or asm depending on the
context, so that we avoid duplicating code to handle constants in
Thumb-1 with -mpure-code.

Use a template so that the algorithm is effectively shared, and
rely on two classes to handle the actual emission as RTL or asm.

The generated sequence is improved to handle right-shiftable and small
values with less instructions. We now generate:

128:
        movs    r0, r0, #128
264:
        movs    r3, gcc-mirror#33
        lsls    r3, gcc-mirror#3
510:
        movs    r3, #255
        lsls    r3, #1
512:
        movs    r3, #1
        lsls    r3, gcc-mirror#9
764:
        movs    r3, #191
        lsls    r3, gcc-mirror#2
65536:
        movs    r3, #1
        lsls    r3, gcc-mirror#16
0x123456:
        movs    r3, gcc-mirror#18 ;0x12
        lsls    r3, gcc-mirror#8
        adds    r3, gcc-mirror#52 ;0x34
        lsls    r3, gcc-mirror#8
        adds    r3, gcc-mirror#86 ;0x56
0x1123456:
        movs    r3, #137 ;0x89
        lsls    r3, gcc-mirror#8
        adds    r3, gcc-mirror#26 ;0x1a
        lsls    r3, gcc-mirror#8
        adds    r3, gcc-mirror#43 ;0x2b
        lsls    r3, #1
0x1000010:
        movs    r3, gcc-mirror#16
        lsls    r3, gcc-mirror#16
        adds    r3, #1
        lsls    r3, gcc-mirror#4
0x1000011:
        movs    r3, #1
        lsls    r3, gcc-mirror#24
        adds    r3, gcc-mirror#17
-8192:
	movs	r3, #1
	lsls	r3, gcc-mirror#13
	rsbs	r3, #0

The patch adds a testcase which does not fully exercise
thumb1_gen_const_int, as other existing patterns already catch small
constants.  These parts of thumb1_gen_const_int are used by
arm_thumb1_mi_thunk.

2020-11-02  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm.c (thumb1_const_rtl, thumb1_const_print): New
	classes.
	(thumb1_gen_const_int): Rename to ...
	(thumb1_gen_const_int_1): ... New helper function. Add capability
	to emit either RTL or asm, improve generated code.
	(thumb1_gen_const_int_rtl): New function.
	* config/arm/arm-protos.h (thumb1_gen_const_int): Rename to
	thumb1_gen_const_int_rtl.
	* config/arm/thumb1.md: Call thumb1_gen_const_int_rtl instead
	of thumb1_gen_const_int.

	gcc/testsuite/
	* gcc.target/arm/pure-code/no-literal-pool-m0.c: New.
kraj pushed a commit to kraj/gcc that referenced this pull request Jan 25, 2021
This adds implementation for the optabs for complex operations.  With this the
following C code:

  void g (float complex a[restrict N], float complex b[restrict N],
	  float complex c[restrict N])
  {
    for (int i=0; i < N; i++)
      c[i] =  a[i] * b[i];
  }

generates

NEON:

g:
        vmov.f32        q11, #0.0  @ v4sf
        add     r3, r2, #1600
.L2:
        vmov    q8, q11  @ v4sf
        vld1.32 {q10}, [r1]!
        vld1.32 {q9}, [r0]!
        vcmla.f32       q8, q9, q10, #0
        vcmla.f32       q8, q9, q10, gcc-mirror#90
        vst1.32 {q8}, [r2]!
        cmp     r3, r2
        bne     .L2
        bx      lr

MVE:

g:
        push    {lr}
        mov     lr, gcc-mirror#100
        dls     lr, lr
.L2:
        vldrw.32        q1, [r1], gcc-mirror#16
        vldrw.32        q2, [r0], gcc-mirror#16
        vcmul.f32       q3, q2, q1, #0
        vcmla.f32       q3, q2, q1, gcc-mirror#90
        vstrw.32        q3, [r2], gcc-mirror#16
        le      lr, .L2
        ldr     pc, [sp], gcc-mirror#4

instead of

g:
        add     r3, r2, #1600
.L2:
        vld2.32 {d20-d23}, [r0]!
        vld2.32 {d16-d19}, [r1]!
        vmul.f32        q14, q11, q9
        vmul.f32        q15, q11, q8
        vneg.f32        q14, q14
        vfma.f32        q15, q10, q9
        vfma.f32        q14, q10, q8
        vmov    q13, q15  @ v4sf
        vmov    q12, q14  @ v4sf
        vst2.32 {d24-d27}, [r2]!
        cmp     r3, r2
        bne     .L2
        bx      lr

and

g:
        add     r3, r2, #1600
.L2:
        vld2.32 {d20-d23}, [r0]!
        vld2.32 {d16-d19}, [r1]!
        vmul.f32        q15, q10, q8
        vmul.f32        q14, q10, q9
        vmls.f32        q15, q11, q9
        vmla.f32        q14, q11, q8
        vmov    q12, q15  @ v4sf
        vmov    q13, q14  @ v4sf
        vst2.32 {d24-d27}, [r2]!
        cmp     r3, r2
        bne     .L2
        bx      lr

respectively.

gcc/ChangeLog:

	* config/arm/iterators.md (rotsplit1, rotsplit2, conj_op, fcmac1,
	VCMLA_OP, VCMUL_OP): New.
	* config/arm/mve.md (mve_vcmlaq<mve_rot><mode>): Support vec_dup 0.
	* config/arm/neon.md (cmul<conj_op><mode>3): New.
	* config/arm/unspecs.md (UNSPEC_VCMLA_CONJ, UNSPEC_VCMLA180_CONJ,
	UNSPEC_VCMUL_CONJ): New.
	* config/arm/vec-common.md (cmul<conj_op><mode>3, arm_vcmla<rot><mode>,
	cml<fcmac1><conj_op><mode>4): New.
fxcoudert pushed a commit to fxcoudert/gcc that referenced this pull request May 4, 2021
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.
nstester pushed a commit to nstester/gcc that referenced this pull request Jun 14, 2021
The fixed error is:

==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900
    #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161
    #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517
    #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686
    #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567
    gcc-mirror#5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656
    gcc-mirror#6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657
    gcc-mirror#7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667
    gcc-mirror#8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773
    gcc-mirror#9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191
    gcc-mirror#10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001
    gcc-mirror#11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154
    gcc-mirror#12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289
    gcc-mirror#13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269
    gcc-mirror#14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537
    gcc-mirror#15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482
    gcc-mirror#16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210
    gcc-mirror#17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349
    gcc-mirror#18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39
    gcc-mirror#19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332
    gcc-mirror#20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd)

0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920)
allocated by thread T0 here:
    #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102
    #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156

gcc/ChangeLog:

	* gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].
mablinov pushed a commit to mablinov/gcc that referenced this pull request Oct 29, 2021
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.
nstester pushed a commit to nstester/gcc that referenced this pull request Nov 1, 2021
This patch gets CSE to re-use constants already inside a vector rather than
re-materializing the constant again.

Basically consider the following case:

#include <stdint.h>
#include <arm_neon.h>

uint64_t
test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
{
  uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
  uint64_t res = a | arr[0];
  uint64x2_t val = vld1q_u64 (arr);
  *rt = vaddq_u64 (val, b);
  return res;
}

The actual behavior is inconsequential however notice that the same constants
are used in the vector (arr and later val) and in the calculation of res.

The code we generate for this however is quite sub-optimal:

test:
        adrp    x2, .LC0
        sub     sp, sp, gcc-mirror#16
        ldr     q1, [x2, #:lo12:.LC0]
        mov     x2, 16502
        movk    x2, 0x1023, lsl 16
        movk    x2, 0x4308, lsl 32
        add     v1.2d, v1.2d, v0.2d
        movk    x2, 0x942, lsl 48
        orr     x0, x0, x2
        str     q1, [x1]
        add     sp, sp, 16
        ret
.LC0:
        .xword  667169396713799798
        .xword  667169396713799798

Essentially we materialize the same constant twice.  The reason for this is
because the front-end lowers the constant extracted from arr[0] quite early on.
If you look into the result of fre you'll find

  <bb 2> :
  arr[0] = 667169396713799798;
  arr[1] = 667169396713799798;
  res_7 = a_6(D) | 667169396713799798;
  _16 = __builtin_aarch64_ld1v2di (&arr);
  _17 = VIEW_CONVERT_EXPR<uint64x2_t>(_16);
  _11 = b_10(D) + _17;
  *rt_12(D) = _11;
  arr ={v} {CLOBBER};
  return res_7;

Which makes sense for further optimization.  However come expand time if the
constant isn't representable in the target arch it will be assigned to a
register again.

(insn 8 5 9 2 (set (reg:V2DI 99)
        (const_vector:V2DI [
                (const_int 667169396713799798 [0x942430810234076]) repeated x2
            ])) "cse.c":7:12 -1
     (nil))
...
(insn 14 13 15 2 (set (reg:DI 103)
        (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
     (nil))
(insn 15 14 16 2 (set (reg:DI 102 [ res ])
        (ior:DI (reg/v:DI 96 [ a ])
            (reg:DI 103))) "cse.c":8:12 -1
     (nil))

And since it's out of the immediate range of the scalar instruction used
combine won't be able to do anything here.

This will then trigger the re-materialization of the constant twice.

To fix this this patch extends CSE to be able to generate an extract for a
constant from another vector, or to make a vector for a constant by duplicating
another constant.

Whether this transformation is done or not depends entirely on the costing for
the target for the different constants and operations.

I Initially also investigated doing this in PRE, but PRE requires at least 2 BB
to work and does not currently have any way to remove redundancies within a
single BB and it did not look easy to support.

gcc/ChangeLog:

	* cse.c (add_to_set): New.
	(find_sets_in_insn): Register constants in sets.
	(canonicalize_insn): Use auto_vec instead.
	(cse_insn): Try materializing using vec_dup.
	* rtl.h (simplify_context::simplify_gen_vec_select,
	simplify_gen_vec_select): New.
	* simplify-rtx.c (simplify_context::simplify_gen_vec_select): New.
mablinov pushed a commit to mablinov/gcc that referenced this pull request Nov 10, 2021
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.
fxcoudert pushed a commit to fxcoudert/gcc that referenced this pull request Nov 23, 2021
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.
xionghul pushed a commit to xionghul/gcc that referenced this pull request Feb 22, 2022
The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp<mode><MVE_vpred>,
vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can
move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and
vcond_mask_<mode><v_cmp_result> back to neon.md since they are not
used by MVE anymore.  The new *<MVE_vpred> patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(!<Is_float_mode> || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
    if (a[b] != 2.0f)
      c = 5.0f;
  return c;
}

fn1:
	ldr     r3, .L3+48
	vldr.64 d4, .L3              // q2=(2.0,2.0,2.0,2.0)
	vldr.64 d5, .L3+8
	vldrw.32        q0, [r3]     // q0=a(0..3)
	adds    r3, r3, gcc-mirror#16
	vcmp.f32        eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
	vldrw.32        q1, [r3]     // q1=a(4..7)
	vmrs     r3, P0
	vcmp.f32        eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
	vmrs    r2, P0  @ movhi
	ands    r3, r3, r2           // r3=select(a(0..3]) & select(a(4..7))
	vldr.64 d4, .L3+16           // q2=(5.0,5.0,5.0,5.0)
	vldr.64 d5, .L3+24
	vmsr     P0, r3
	vldr.64 d6, .L3+32           // q3=(4.0,4.0,4.0,4.0)
	vldr.64 d7, .L3+40
	vpsel q3, q3, q2             // q3=vcond_mask(4.0,5.0)
	vmov.32 r2, q3[1]            // keep the scalar max
	vmov.32 r0, q3[3]
	vmov.32 r3, q3[2]
	vmov.f32        s11, s12
	vmov    s15, r2
	vmov    s14, r3
	vmaxnm.f32      s15, s11, s15
	vmaxnm.f32      s15, s15, s14
	vmov    s14, r0
	vmaxnm.f32      s15, s15, s14
	vmov    r0, s15
	bx      lr
	.L4:
	.align  3
	.L3:
	.word   1073741824	// 2.0f
	.word   1073741824
	.word   1073741824
	.word   1073741824
	.word   1084227584	// 5.0f
	.word   1084227584
	.word   1084227584
	.word   1084227584
	.word   1082130432	// 4.0f
	.word   1082130432
	.word   1082130432
	.word   1082130432

This patch adds tests that trigger an ICE without this fix.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	PR target/100757
	gcc/
	* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
	(arm_expand_vector_compare): Update prototype.
	* config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New.
	(arm_vector_mode_supported_p): Add support for VxBI modes.
	(arm_expand_vector_compare): Remove useless generation of vpsel.
	(arm_expand_vcond): Fix select operands.
	(arm_get_mask_mode): New.
	* config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New.
	(vec_cmpu<mode><MVE_vpred>): New.
	(vcond_mask_<mode><MVE_vpred>): New.
	* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>)
	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ...
	* config/arm/neon.md (vec_cmp<mode><v_cmp_result>)
	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here
	and disable for MVE.
	* doc/sourcebuild.texi (arm_mve): Document new effective-target.

	gcc/testsuite/
	PR target/100757
	* gcc.target/arm/simd/pr100757-2.c: New.
	* gcc.target/arm/simd/pr100757-3.c: New.
	* gcc.target/arm/simd/pr100757-4.c: New.
	* gcc.target/arm/simd/pr100757.c: New.
	* gcc.dg/signbit-2.c: Skip when targeting ARM/MVE.
	* lib/target-supports.exp (check_effective_target_arm_mve): New.
catap pushed a commit to catap/gcc that referenced this pull request Feb 22, 2022
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.

(cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
markmentovai pushed a commit to markmentovai/gcc that referenced this pull request Jun 13, 2022
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.
catap pushed a commit to catap/gcc that referenced this pull request May 3, 2023
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.

(cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap pushed a commit to catap/gcc that referenced this pull request May 3, 2023
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.

(cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap pushed a commit to catap/gcc that referenced this pull request May 3, 2023
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.

(cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap pushed a commit to catap/gcc that referenced this pull request May 3, 2023
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.

(cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap pushed a commit to catap/gcc that referenced this pull request May 3, 2023
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.

(cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap pushed a commit to catap/gcc that referenced this pull request May 4, 2023
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.

(cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap pushed a commit to catap/gcc that referenced this pull request May 4, 2023
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.

(cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
Signed-off-by: Kirill A. Korinsky <kirill@korins.ky>
catap pushed a commit to catap/gcc that referenced this pull request Nov 12, 2023
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.

(cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
Signed-off-by: Kirill A. Korinsky <kirill@korins.ky>
catap pushed a commit to catap/gcc that referenced this pull request Nov 14, 2023
darwinpcs, packs some stack items, which means that one cannot
guarantee that they are aligned to DI.  Check for these cases and
reject PRFM instructions then.

Note, that this generally results in use of an extra temporary reg.

clang uses 'PRFUM' instructions in those cases, so we have a missed
optimisation opportunity (low priority).

fixes issue gcc-mirror#16.

(cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
Signed-off-by: Kirill A. Korinsky <kirill@korins.ky>
NinaRanns referenced this pull request in NinaRanns/gcc Jul 31, 2024
 infiniloop and contract_specifier_seq parsing
hubot pushed a commit that referenced this pull request Dec 6, 2024
This test would fail if GCC is configured with non-default options,
such as -mtune=cortex-a9.

This 'unexpected' scheduling makes the DLSTP optimization generate
	subs    lr, #16
	bhi	.L4
	lctp
	pop     {r4, r5, pc}
.L4:
	sub     ip, ip, #16
	b      <loop-begin>

instead of the expected
sub     ip, ip, #16
letp lr, <loop-begin>

Although GCC still optimizes all 144 loops, only 96 use letp, 48
others use lctp.

The patch simply forces -mtune=cortex-m55 to avoid this unexpected
issue.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/dlstp-compile-asm-1.c: Add -mtune=cortex-m55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants