mirrored from git://gcc.gnu.org/git/gcc.git
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Merge latest upstream changes into master #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
update teams branch from upstream
also add checksums
Merge upstream gcc-mirror/gcc into sourceryinstitute/gcc
Merge sourceryinstitute/master into sourceryinstitute/teams
Merge master into download-opencoarrays-mpich
habemus-papadum
pushed a commit
to habemus-papadum/gcc
that referenced
this pull request
Nov 29, 2017
This patch improves alloca alignment. Currently alloca reserves too much space as it aligns twice, and generates unnecessary stack alignment code. When the requested alignment is lower than the stack alignment, no extra alignment is needed. If the requested alignment is higher, we need to increase the size by the difference of the requested alignment and the stack alignment. As a result, the alloca alignment is exactly as expected: alloca (16): sub sp, sp, gcc-mirror#16 mov x1, sp alloca (x): add x0, x0, 15 and x0, x0, -16 sub sp, sp, x0 mov x0, sp __builtin_alloca_with_align (x, 512): add x0, x0, 63 and x0, x0, -16 sub sp, sp, x0 add x0, sp, 63 and x0, x0, -64 gcc/ * explow.c (get_dynamic_stack_size): Improve dynamic alignment. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251713 138bc75d-0d04-0410-961f-82ee72b054a4
hubot
pushed a commit
that referenced
this pull request
Feb 8, 2018
The shrinkwrap optimization added in GCC 7 allows each callee-save to be delayed and done only across blocks which need a particular callee-save. Although this reduces unnecessary memory traffic on code paths that need few callee-saves, it typically uses LDR/STR rather than LDP/STP. This means more memory accesses and increased codesize, ~1.0% on average. To improve this, if a particular callee-save must be saved/restored, also add the adjacent callee-save to allow use of LDP/STP. This significantly reduces codesize (for example gcc_r, povray_r, parest_r, xalancbmk_r are 1% smaller). This is a simple fix which can be backported. A more advanced approach would scan blocks for pairs of callee-saves, but that requires a full rewrite of all the callee-save code which is too late at this stage. An example epilog in a shrinkwrapped function before: ldp x21, x22, [sp,#16] ldr x23, [sp,#32] ldr x24, [sp,#40] ldp x25, x26, [sp,#48] ldr x27, [sp,#64] ldr x28, [sp,#72] ldr x30, [sp,#80] ldr d8, [sp,#88] ldp x19, x20, [sp],#96 ret And after this patch: ldr d8, [sp,#88] ldp x21, x22, [sp,#16] ldp x23, x24, [sp,#32] ldp x25, x26, [sp,#48] ldp x27, x28, [sp,#64] ldr x30, [sp,#80] ldp x19, x20, [sp],#96 ret gcc/ * config/aarch64/aarch64.c (aarch64_components_for_bb): Increase LDP/STP opportunities by adding adjacent callee-saves. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@257482 138bc75d-0d04-0410-961f-82ee72b054a4
kraj
pushed a commit
to kraj/gcc
that referenced
this pull request
Oct 12, 2020
Prevents the following UBSAN error: ./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/torture/pr49770.C -O2 -c /home/marxin/Programming/gcc2/gcc/ipa-modref-tree.h:482:22: runtime error: load of value 2, which is not a valid value for type 'bool' #0 0x1fdb4d1 in modref_tree<int>::merge(modref_tree<int>*, vec<modref_parm_map, va_heap, vl_ptr>*) /home/marxin/Programming/gcc2/gcc/ipa-modref-tree.h:482 #1 0x1fcadaa in merge_call_side_effects(modref_summary*, gimple*, modref_summary*, bool) /home/marxin/Programming/gcc2/gcc/ipa-modref.c:511 gcc-mirror#2 0x1fcbadd in analyze_call /home/marxin/Programming/gcc2/gcc/ipa-modref.c:642 gcc-mirror#3 0x1fcc061 in analyze_stmt /home/marxin/Programming/gcc2/gcc/ipa-modref.c:732 gcc-mirror#4 0x1fccf31 in analyze_function /home/marxin/Programming/gcc2/gcc/ipa-modref.c:823 gcc-mirror#5 0x1fd17e5 in execute /home/marxin/Programming/gcc2/gcc/ipa-modref.c:1441 gcc-mirror#6 0x25cca6e in execute_one_pass(opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2509 gcc-mirror#7 0x25cd39b in execute_pass_list_1 /home/marxin/Programming/gcc2/gcc/passes.c:2597 gcc-mirror#8 0x25cd450 in execute_pass_list_1 /home/marxin/Programming/gcc2/gcc/passes.c:2598 gcc-mirror#9 0x25cd4ee in execute_pass_list(function*, opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2608 gcc-mirror#10 0x25c7a5a in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/Programming/gcc2/gcc/passes.c:1726 gcc-mirror#11 0x25cfa3f in execute_ipa_pass_list(opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2941 gcc-mirror#12 0x173572d in ipa_passes /home/marxin/Programming/gcc2/gcc/cgraphunit.c:2642 gcc-mirror#13 0x17364ee in symbol_table::compile() /home/marxin/Programming/gcc2/gcc/cgraphunit.c:2777 gcc-mirror#14 0x17372d9 in symbol_table::finalize_compilation_unit() /home/marxin/Programming/gcc2/gcc/cgraphunit.c:3022 gcc-mirror#15 0x2a1f00a in compile_file /home/marxin/Programming/gcc2/gcc/toplev.c:485 gcc-mirror#16 0x2a27dc8 in do_compile /home/marxin/Programming/gcc2/gcc/toplev.c:2321 gcc-mirror#17 0x2a283cc in toplev::main(int, char**) /home/marxin/Programming/gcc2/gcc/toplev.c:2460 gcc-mirror#18 0x54f21cd in main /home/marxin/Programming/gcc2/gcc/main.c:39 gcc-mirror#19 0x7ffff6f0de09 in __libc_start_main ../csu/libc-start.c:314 gcc-mirror#20 0x9eac09 in _start (/home/marxin/Programming/gcc2/objdir/gcc/cc1plus+0x9eac09) gcc/ChangeLog: * ipa-modref.c (merge_call_side_effects): Clear modref_parm_map fields in the vector.
kraj
pushed a commit
to kraj/gcc
that referenced
this pull request
Nov 2, 2020
Enable thumb1_gen_const_int to generate RTL or asm depending on the context, so that we avoid duplicating code to handle constants in Thumb-1 with -mpure-code. Use a template so that the algorithm is effectively shared, and rely on two classes to handle the actual emission as RTL or asm. The generated sequence is improved to handle right-shiftable and small values with less instructions. We now generate: 128: movs r0, r0, #128 264: movs r3, gcc-mirror#33 lsls r3, gcc-mirror#3 510: movs r3, #255 lsls r3, #1 512: movs r3, #1 lsls r3, gcc-mirror#9 764: movs r3, #191 lsls r3, gcc-mirror#2 65536: movs r3, #1 lsls r3, gcc-mirror#16 0x123456: movs r3, gcc-mirror#18 ;0x12 lsls r3, gcc-mirror#8 adds r3, gcc-mirror#52 ;0x34 lsls r3, gcc-mirror#8 adds r3, gcc-mirror#86 ;0x56 0x1123456: movs r3, #137 ;0x89 lsls r3, gcc-mirror#8 adds r3, gcc-mirror#26 ;0x1a lsls r3, gcc-mirror#8 adds r3, gcc-mirror#43 ;0x2b lsls r3, #1 0x1000010: movs r3, gcc-mirror#16 lsls r3, gcc-mirror#16 adds r3, #1 lsls r3, gcc-mirror#4 0x1000011: movs r3, #1 lsls r3, gcc-mirror#24 adds r3, gcc-mirror#17 -8192: movs r3, #1 lsls r3, gcc-mirror#13 rsbs r3, #0 The patch adds a testcase which does not fully exercise thumb1_gen_const_int, as other existing patterns already catch small constants. These parts of thumb1_gen_const_int are used by arm_thumb1_mi_thunk. 2020-11-02 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm.c (thumb1_const_rtl, thumb1_const_print): New classes. (thumb1_gen_const_int): Rename to ... (thumb1_gen_const_int_1): ... New helper function. Add capability to emit either RTL or asm, improve generated code. (thumb1_gen_const_int_rtl): New function. * config/arm/arm-protos.h (thumb1_gen_const_int): Rename to thumb1_gen_const_int_rtl. * config/arm/thumb1.md: Call thumb1_gen_const_int_rtl instead of thumb1_gen_const_int. gcc/testsuite/ * gcc.target/arm/pure-code/no-literal-pool-m0.c: New.
kraj
pushed a commit
to kraj/gcc
that referenced
this pull request
Jan 25, 2021
This adds implementation for the optabs for complex operations. With this the following C code: void g (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] * b[i]; } generates NEON: g: vmov.f32 q11, #0.0 @ v4sf add r3, r2, #1600 .L2: vmov q8, q11 @ v4sf vld1.32 {q10}, [r1]! vld1.32 {q9}, [r0]! vcmla.f32 q8, q9, q10, #0 vcmla.f32 q8, q9, q10, gcc-mirror#90 vst1.32 {q8}, [r2]! cmp r3, r2 bne .L2 bx lr MVE: g: push {lr} mov lr, gcc-mirror#100 dls lr, lr .L2: vldrw.32 q1, [r1], gcc-mirror#16 vldrw.32 q2, [r0], gcc-mirror#16 vcmul.f32 q3, q2, q1, #0 vcmla.f32 q3, q2, q1, gcc-mirror#90 vstrw.32 q3, [r2], gcc-mirror#16 le lr, .L2 ldr pc, [sp], gcc-mirror#4 instead of g: add r3, r2, #1600 .L2: vld2.32 {d20-d23}, [r0]! vld2.32 {d16-d19}, [r1]! vmul.f32 q14, q11, q9 vmul.f32 q15, q11, q8 vneg.f32 q14, q14 vfma.f32 q15, q10, q9 vfma.f32 q14, q10, q8 vmov q13, q15 @ v4sf vmov q12, q14 @ v4sf vst2.32 {d24-d27}, [r2]! cmp r3, r2 bne .L2 bx lr and g: add r3, r2, #1600 .L2: vld2.32 {d20-d23}, [r0]! vld2.32 {d16-d19}, [r1]! vmul.f32 q15, q10, q8 vmul.f32 q14, q10, q9 vmls.f32 q15, q11, q9 vmla.f32 q14, q11, q8 vmov q12, q15 @ v4sf vmov q13, q14 @ v4sf vst2.32 {d24-d27}, [r2]! cmp r3, r2 bne .L2 bx lr respectively. gcc/ChangeLog: * config/arm/iterators.md (rotsplit1, rotsplit2, conj_op, fcmac1, VCMLA_OP, VCMUL_OP): New. * config/arm/mve.md (mve_vcmlaq<mve_rot><mode>): Support vec_dup 0. * config/arm/neon.md (cmul<conj_op><mode>3): New. * config/arm/unspecs.md (UNSPEC_VCMLA_CONJ, UNSPEC_VCMLA180_CONJ, UNSPEC_VCMUL_CONJ): New. * config/arm/vec-common.md (cmul<conj_op><mode>3, arm_vcmla<rot><mode>, cml<fcmac1><conj_op><mode>4): New.
fxcoudert
pushed a commit
to fxcoudert/gcc
that referenced
this pull request
May 4, 2021
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16.
nstester
pushed a commit
to nstester/gcc
that referenced
this pull request
Jun 14, 2021
The fixed error is: ==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900 #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172 #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161 #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517 #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686 #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567 gcc-mirror#5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656 gcc-mirror#6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657 gcc-mirror#7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667 gcc-mirror#8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773 gcc-mirror#9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191 gcc-mirror#10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001 gcc-mirror#11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154 gcc-mirror#12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289 gcc-mirror#13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269 gcc-mirror#14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537 gcc-mirror#15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482 gcc-mirror#16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210 gcc-mirror#17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349 gcc-mirror#18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39 gcc-mirror#19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332 gcc-mirror#20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd) 0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920) allocated by thread T0 here: #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102 #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156 gcc/ChangeLog: * gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].
mablinov
pushed a commit
to mablinov/gcc
that referenced
this pull request
Oct 29, 2021
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16.
nstester
pushed a commit
to nstester/gcc
that referenced
this pull request
Nov 1, 2021
This patch gets CSE to re-use constants already inside a vector rather than re-materializing the constant again. Basically consider the following case: #include <stdint.h> #include <arm_neon.h> uint64_t test (uint64_t a, uint64x2_t b, uint64x2_t* rt) { uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL}; uint64_t res = a | arr[0]; uint64x2_t val = vld1q_u64 (arr); *rt = vaddq_u64 (val, b); return res; } The actual behavior is inconsequential however notice that the same constants are used in the vector (arr and later val) and in the calculation of res. The code we generate for this however is quite sub-optimal: test: adrp x2, .LC0 sub sp, sp, gcc-mirror#16 ldr q1, [x2, #:lo12:.LC0] mov x2, 16502 movk x2, 0x1023, lsl 16 movk x2, 0x4308, lsl 32 add v1.2d, v1.2d, v0.2d movk x2, 0x942, lsl 48 orr x0, x0, x2 str q1, [x1] add sp, sp, 16 ret .LC0: .xword 667169396713799798 .xword 667169396713799798 Essentially we materialize the same constant twice. The reason for this is because the front-end lowers the constant extracted from arr[0] quite early on. If you look into the result of fre you'll find <bb 2> : arr[0] = 667169396713799798; arr[1] = 667169396713799798; res_7 = a_6(D) | 667169396713799798; _16 = __builtin_aarch64_ld1v2di (&arr); _17 = VIEW_CONVERT_EXPR<uint64x2_t>(_16); _11 = b_10(D) + _17; *rt_12(D) = _11; arr ={v} {CLOBBER}; return res_7; Which makes sense for further optimization. However come expand time if the constant isn't representable in the target arch it will be assigned to a register again. (insn 8 5 9 2 (set (reg:V2DI 99) (const_vector:V2DI [ (const_int 667169396713799798 [0x942430810234076]) repeated x2 ])) "cse.c":7:12 -1 (nil)) ... (insn 14 13 15 2 (set (reg:DI 103) (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1 (nil)) (insn 15 14 16 2 (set (reg:DI 102 [ res ]) (ior:DI (reg/v:DI 96 [ a ]) (reg:DI 103))) "cse.c":8:12 -1 (nil)) And since it's out of the immediate range of the scalar instruction used combine won't be able to do anything here. This will then trigger the re-materialization of the constant twice. To fix this this patch extends CSE to be able to generate an extract for a constant from another vector, or to make a vector for a constant by duplicating another constant. Whether this transformation is done or not depends entirely on the costing for the target for the different constants and operations. I Initially also investigated doing this in PRE, but PRE requires at least 2 BB to work and does not currently have any way to remove redundancies within a single BB and it did not look easy to support. gcc/ChangeLog: * cse.c (add_to_set): New. (find_sets_in_insn): Register constants in sets. (canonicalize_insn): Use auto_vec instead. (cse_insn): Try materializing using vec_dup. * rtl.h (simplify_context::simplify_gen_vec_select, simplify_gen_vec_select): New. * simplify-rtx.c (simplify_context::simplify_gen_vec_select): New.
mablinov
pushed a commit
to mablinov/gcc
that referenced
this pull request
Nov 10, 2021
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16.
fxcoudert
pushed a commit
to fxcoudert/gcc
that referenced
this pull request
Nov 23, 2021
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16.
xionghul
pushed a commit
to xionghul/gcc
that referenced
this pull request
Feb 22, 2022
The problem in this PR is that we call VPSEL with a mask of vector type instead of HImode. This happens because operand 3 in vcond_mask is the pre-computed vector comparison and has vector type. This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE, returning the appropriate VxBI mode when targeting MVE. In turn, this implies implementing vec_cmp<mode><MVE_vpred>, vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and vcond_mask_<mode><v_cmp_result> back to neon.md since they are not used by MVE anymore. The new *<MVE_vpred> patterns listed above are implemented in mve.md since they are only valid for MVE. However this may make maintenance/comparison more painful than having all of them in vec-common.md. In the process, we can get rid of the recently added vcond_mve parameter of arm_expand_vector_compare. Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm: Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH iterator added in r12-835 (to have V4HF/V8HF support), as well as the (!<Is_float_mode> || flag_unsafe_math_optimizations) condition which was not present before r12-834 although SF modes were enabled by VDQW (I think this was a bug). Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no longer need to generate vpsel with vectors of 0 and 1: the masks are now merged via scalar 'ands' instructions operating on 16-bit masks after converting the boolean vectors. In addition, this patch fixes a problem in arm_expand_vcond() where the result would be a vector of 0 or 1 instead of operand 1 or 2. Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new arm_mve effective target. Reducing the number of iterations in pr100757-3.c from 32 to 8, we generate the code below: float a[32]; float fn1(int d) { float c = 4.0f; for (int b = 0; b < 8; b++) if (a[b] != 2.0f) c = 5.0f; return c; } fn1: ldr r3, .L3+48 vldr.64 d4, .L3 // q2=(2.0,2.0,2.0,2.0) vldr.64 d5, .L3+8 vldrw.32 q0, [r3] // q0=a(0..3) adds r3, r3, gcc-mirror#16 vcmp.f32 eq, q0, q2 // cmp a(0..3) == (2.0,2.0,2.0,2.0) vldrw.32 q1, [r3] // q1=a(4..7) vmrs r3, P0 vcmp.f32 eq, q1, q2 // cmp a(4..7) == (2.0,2.0,2.0,2.0) vmrs r2, P0 @ movhi ands r3, r3, r2 // r3=select(a(0..3]) & select(a(4..7)) vldr.64 d4, .L3+16 // q2=(5.0,5.0,5.0,5.0) vldr.64 d5, .L3+24 vmsr P0, r3 vldr.64 d6, .L3+32 // q3=(4.0,4.0,4.0,4.0) vldr.64 d7, .L3+40 vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0) vmov.32 r2, q3[1] // keep the scalar max vmov.32 r0, q3[3] vmov.32 r3, q3[2] vmov.f32 s11, s12 vmov s15, r2 vmov s14, r3 vmaxnm.f32 s15, s11, s15 vmaxnm.f32 s15, s15, s14 vmov s14, r0 vmaxnm.f32 s15, s15, s14 vmov r0, s15 bx lr .L4: .align 3 .L3: .word 1073741824 // 2.0f .word 1073741824 .word 1073741824 .word 1073741824 .word 1084227584 // 5.0f .word 1084227584 .word 1084227584 .word 1084227584 .word 1082130432 // 4.0f .word 1082130432 .word 1082130432 .word 1082130432 This patch adds tests that trigger an ICE without this fix. The pr100757*.c testcases are derived from gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using various types and return values different from 0 and 1 to avoid commonalization with boolean masks. In addition, since we should not need these masks, the tests make sure they are not present. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> PR target/100757 gcc/ * config/arm/arm-protos.h (arm_get_mask_mode): New prototype. (arm_expand_vector_compare): Update prototype. * config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New. (arm_vector_mode_supported_p): Add support for VxBI modes. (arm_expand_vector_compare): Remove useless generation of vpsel. (arm_expand_vcond): Fix select operands. (arm_get_mask_mode): New. * config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New. (vec_cmpu<mode><MVE_vpred>): New. (vcond_mask_<mode><MVE_vpred>): New. * config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>) (vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ... * config/arm/neon.md (vec_cmp<mode><v_cmp_result>) (vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here and disable for MVE. * doc/sourcebuild.texi (arm_mve): Document new effective-target. gcc/testsuite/ PR target/100757 * gcc.target/arm/simd/pr100757-2.c: New. * gcc.target/arm/simd/pr100757-3.c: New. * gcc.target/arm/simd/pr100757-4.c: New. * gcc.target/arm/simd/pr100757.c: New. * gcc.dg/signbit-2.c: Skip when targeting ARM/MVE. * lib/target-supports.exp (check_effective_target_arm_mve): New.
catap
pushed a commit
to catap/gcc
that referenced
this pull request
Feb 22, 2022
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
markmentovai
pushed a commit
to markmentovai/gcc
that referenced
this pull request
Jun 13, 2022
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16.
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 3, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 3, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 3, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 3, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 3, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 4, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 4, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca) Signed-off-by: Kirill A. Korinsky <kirill@korins.ky>
catap
pushed a commit
to catap/gcc
that referenced
this pull request
Nov 12, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca) Signed-off-by: Kirill A. Korinsky <kirill@korins.ky>
catap
pushed a commit
to catap/gcc
that referenced
this pull request
Nov 14, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca) Signed-off-by: Kirill A. Korinsky <kirill@korins.ky>
NinaRanns
referenced
this pull request
in NinaRanns/gcc
Jul 31, 2024
infiniloop and contract_specifier_seq parsing
hubot
pushed a commit
that referenced
this pull request
Dec 6, 2024
This test would fail if GCC is configured with non-default options, such as -mtune=cortex-a9. This 'unexpected' scheduling makes the DLSTP optimization generate subs lr, #16 bhi .L4 lctp pop {r4, r5, pc} .L4: sub ip, ip, #16 b <loop-begin> instead of the expected sub ip, ip, #16 letp lr, <loop-begin> Although GCC still optimizes all 144 loops, only 96 use letp, 48 others use lctp. The patch simply forces -mtune=cortex-m55 to avoid this unexpected issue. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/dlstp-compile-asm-1.c: Add -mtune=cortex-m55
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.