forked from gcc-mirror/gcc
-
Notifications
You must be signed in to change notification settings - Fork 2
[GitHub Actions] nightly build script for gccefi2linux #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
nstester
merged 1 commit into
nstester:master
from
hanzhan1:private/hanzhan1/gccefi2linux-build-script
Jan 28, 2021
Merged
[GitHub Actions] nightly build script for gccefi2linux #1
nstester
merged 1 commit into
nstester:master
from
hanzhan1:private/hanzhan1/gccefi2linux-build-script
Jan 28, 2021
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tfzhu
approved these changes
Jan 28, 2021
nstester
pushed a commit
that referenced
this pull request
Feb 23, 2021
/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128' #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77 #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190 #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338 #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370 #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f) gcc-mirror#5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24) gcc-mirror#6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd) Differential Revision: https://reviews.llvm.org/D97263 Cherry-pick from 16ede09.
nstester
pushed a commit
that referenced
this pull request
Mar 23, 2021
This test relies on -mfloat-abi=hard to pass (otherwise test_mov_imm_[12] directly build the 1.0 fp16 representation via movw r0, #15360 rather than using vmov.f16 s0, #1.0e+0 as expected by scan-assembler-times) Adding the arm_hard_ok check makes the test unsupported eg. on arm-linux-gnueabi instead of reporting a failure. 2021-03-20 Christophe Lyon <christophe.lyon@linaro.org> gcc/testsuite/ * gcc.target/arm/armv8_2-fp16-scalar-2.c: Add arm_hard_ok.
nstester
pushed a commit
that referenced
this pull request
Apr 6, 2021
This patch fixes PR99748 which shows us trying to pass the argument to __aeabi_f2iz in the VFP register s0 when the library function is expecting to use the GPR r0. It also fixes the __aeabi_f2uiz case which was broken in the same way. For the testcase in the PR, here is the code we generate before the patch (with -mfloat-abi=hard -march=armv8.1-m.main+mve -O0): main: push {r7, lr} sub sp, sp, gcc-mirror#8 add r7, sp, #0 mov r3, #1065353216 str r3, [r7, #4] @ float vldr.32 s0, [r7, #4] bl __aeabi_f2iz mov r3, r0 cmp r3, #1 [...] This becomes: main: push {r7, lr} sub sp, sp, gcc-mirror#8 add r7, sp, #0 mov r3, #1065353216 str r3, [r7, #4] @ float ldr r0, [r7, #4] @ float bl __aeabi_f2iz mov r3, r0 cmp r3, #1 [...] after the patch. We see a similar change for the same testcase with a cast to unsigned instead of int. gcc/ChangeLog: PR target/99748 * config/arm/arm.c (arm_libcall_uses_aapcs_base): Also use base PCS for [su]fix_optab.
nstester
pushed a commit
that referenced
this pull request
Apr 12, 2021
We can't resolve the call to foo<42> before instantiation of G, because the template parameter of #1 has dependent type. But we were missing that in our dependency check, because the tree walk of DECL_TEMPLATE_PARMS doesn't look into the types of template parameters. So look at them directly. gcc/cp/ChangeLog: PR c++/93085 * pt.c (uses_outer_template_parms): Handle non-type and template template parameters specifically. gcc/testsuite/ChangeLog: PR c++/93085 * g++.dg/template/dependent-tmpl1.C: New test.
nstester
pushed a commit
that referenced
this pull request
May 7, 2021
gcc/ada/ * errout.ads (Set_Identifier_Casing): Add pragma Convention C. * eval_fat.ads (Rounding_Mode): Likewise. (Machine): Add WARNING comment line. * exp_code.ads (Clobber_Get_Next): Add pragma Convention C. * fe.h (Compiler_Abort): Fix return type. (Set_Identifier_Casing): Change type of parameters. (Clobber_Get_Next): Change return type. * gcc-interface/trans.c (gnat_to_gnu) <N_Code_Statement>: Add cast.
nstester
pushed a commit
that referenced
this pull request
Jun 14, 2021
The fixed error is: ==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900 #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172 #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161 #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517 #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686 #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567 gcc-mirror#5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656 gcc-mirror#6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657 gcc-mirror#7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667 gcc-mirror#8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773 gcc-mirror#9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191 gcc-mirror#10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001 gcc-mirror#11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154 gcc-mirror#12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289 gcc-mirror#13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269 gcc-mirror#14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537 gcc-mirror#15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482 gcc-mirror#16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210 gcc-mirror#17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349 gcc-mirror#18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39 gcc-mirror#19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332 gcc-mirror#20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd) 0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920) allocated by thread T0 here: #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102 #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156 gcc/ChangeLog: * gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].
nstester
pushed a commit
that referenced
this pull request
Aug 19, 2021
As suggested in the PR, the following patch adds two new clrsb expansion possibilities if target doesn't have clrsb_optab for the requested nor wider modes, but does have clz_optab for the requested mode. One expansion is clrsb (op0) expands as clz (op0 ^ (((stype)op0) >> (prec-1))) - 1 which is usable if CLZ_DEFINED_VALUE_AT_ZERO is 2 with value of prec, because the clz argument can be 0 and clrsb should give prec-1 in that case. The other expansion is clz (((op0 << 1) ^ (((stype)op0) >> (prec-1))) | 1) where the clz argument is never 0, but it is one operation longer. E.g. on x86_64-linux with -O2 -mno-lzcnt, this results for int foo (int x) { return __builtin_clrsb (x); } in - subq $8, %rsp - movslq %edi, %rdi - call __clrsbdi2 - addq $8, %rsp - subl $32, %eax + leal (%rdi,%rdi), %eax + sarl $31, %edi + xorl %edi, %eax + orl $1, %eax + bsrl %eax, %eax + xorl $31, %eax and with -O2 -mlzcnt: + movl %edi, %eax + sarl $31, %eax + xorl %edi, %eax + lzcntl %eax, %eax + subl $1, %eax On armv7hl-linux-gnueabi with -O2: - push {r4, lr} - bl __clrsbsi2 - pop {r4, pc} + @ link register save eliminated. + eor r0, r0, r0, asr gcc-mirror#31 + clz r0, r0 + sub r0, r0, #1 + bx lr As it (at least usually) will make code larger, it is disabled for -Os or cold instructions. 2021-08-19 Jakub Jelinek <jakub@redhat.com> PR middle-end/101950 * optabs.c (expand_clrsb_using_clz): New function. (expand_unop): Use it as another clrsb expansion fallback. * gcc.target/i386/pr101950-1.c: New test. * gcc.target/i386/pr101950-2.c: New test.
nstester
pushed a commit
that referenced
this pull request
Nov 26, 2021
Fixes: ==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178 READ of size 4 at 0x00000666ca5c thread T0 #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855 #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916 #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887 #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829 #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427 gcc-mirror#5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346 gcc-mirror#6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967 gcc-mirror#7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808 gcc-mirror#8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146 for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d. gcc/d/ChangeLog: * d-attribs.cc (parse_optimize_options): Check index before accessing cl_options.
nstester
pushed a commit
that referenced
this pull request
Nov 29, 2021
As described in detail in the PR, in C++20 implicitly capturing 'this' via a '=' capture default is deprecated, and in C++17 adding an explicit 'this' capture alongside a '=' capture default is diagnosed as redundant (and is strictly speaking ill-formed). This means it's impossible to write, in a forward-compatible way, a C++17 lambda that has a '=' capture default and that also captures 'this' (implicitly or explicitly): [=] { this; } // #1 deprecated in C++20, OK in C++17 // GCC issues a -Wdeprecated warning in C++20 mode [=, this] { } // #2 ill-formed in C++17, OK in C++20 // GCC issues an unconditional warning in C++17 mode This patch resolves this dilemma by downgrading the warning for #2 into a -pedantic one. In passing, move it into the -Wc++20-extensions class of warnings and adjust its wording accordingly. PR c++/100493 gcc/cp/ChangeLog: * parser.c (cp_parser_lambda_introducer): In C++17, don't diagnose a redundant 'this' capture alongside a by-copy capture default unless -pedantic. Move the diagnostic into -Wc++20-extensions and adjust wording accordingly. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/lambda-this1.C: Adjust expected diagnostics. * g++.dg/cpp1z/lambda-this8.C: New test. * g++.dg/cpp2a/lambda-this3.C: Compile with -pedantic in C++17 to continue to diagnose redundant 'this' captures.
nstester
pushed a commit
that referenced
this pull request
Dec 30, 2021
…imize or target pragmas [PR103012] The following testcases ICE when an optimize or target pragma is followed by a long line (4096+ chars). This is because on such long lines we can't use columns anymore, but the cpp_define calls performed by c_cpp_builtins_optimize_pragma or from the backend hooks for target pragma are done on temporary buffers and expect to get columns from whatever line they appear on (which happens to be the long line after optimize/target pragma), and we run into: #0 fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986 #1 0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502 #2 0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827 #3 0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898 #4 0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592 gcc-mirror#5 0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394 gcc-mirror#6 0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601 gcc-mirror#7 0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639 gcc-mirror#8 0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589 gcc-mirror#9 0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513 gcc-mirror#10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522 gcc-mirror#11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>) at ../../gcc/c-family/c-cppbuiltin.c:600 assertion that LC_RENAME doesn't happen first. I think the right fix is emit those predefined macros upon optimize/target pragmas with BUILTINS_LOCATION, like we already do for those macros at the start of the TU, they don't appear in columns of the next line after it. Another possibility would be to force them at the location of the pragma. 2021-12-30 Jakub Jelinek <jakub@redhat.com> PR c++/103012 gcc/ * config/i386/i386-c.c (ix86_pragma_target_parse): Perform cpp_define/cpp_undef calls with forced token locations BUILTINS_LOCATION. * config/arm/arm-c.c (arm_pragma_target_parse): Likewise. * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise. * config/s390/s390-c.c (s390_pragma_target_parse): Likewise. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform cpp_define_unused/cpp_undef calls with forced token locations BUILTINS_LOCATION. gcc/testsuite/ PR c++/103012 * g++.dg/cpp/pr103012.C: New test. * g++.target/i386/pr103012.C: New test.
nstester
pushed a commit
that referenced
this pull request
Jan 21, 2022
This is a "canonical types differ for identical types" ICE, which started with r11-4682. It's a bit tricky to explain. Consider: template <typename T> struct S { S<T> bar() noexcept(T::value); // #1 S<T> foo() noexcept(T::value); // #2 }; template <typename T> S<T> S<T>::foo() noexcept(T::value) {} // #3 We ICE because #3 and #2 have the same type, but their canonical types differ: TYPE_CANONICAL (#3) == #2 but TYPE_CANONICAL (#2) == #1. The member functions #1 and #2 have the same type. However, since their noexcept-specifier is deferred, when parsing them, we create a variant for both of them, because DEFERRED_PARSE cannot be compared. In other words, build_cp_fntype_variant's tree v = TYPE_MAIN_VARIANT (type); for (; v; v = TYPE_NEXT_VARIANT (v)) if (cp_check_qualified_type (v, type, type_quals, rqual, raises, late)) return v; will *not* find an existing variant when creating a method_type for #2, so we have to create a new one. But then we perform delayed parsing and call fixup_deferred_exception_variants for #1 and #2. f_d_e_v will replace TYPE_RAISES_EXCEPTIONS with the newly parsed noexcept-specifier. It also sets TYPE_CANONICAL (#2) to #1. Both noexcepts turned out to be the same, so now we have two equivalent variants in the list! I.e., +-----------------+ +-----------------+ +-----------------+ | main | | #2 | | #1 | | S S::<T379>(S*) |----->| S S::<T37c>(S*) |----->| S S::<T37a>(S*) |----->NULL | - | | noex(T::value) | | noex(T::value) | +-----------------+ +-----------------+ +-----------------+ Then we get to #3. As for #1 and #2, grokdeclarator calls build_memfn_type, which ends up calling build_cp_fntype_variant, which will use the loop above to look for an existing variant. The first one that matches cp_check_qualified_type will be used, so we use #2 rather than #1, and the TYPE_CANONICAL mismatch follows. Hopefully that makes sense. As for the fix, I didn't think I could rewrite the method_type #2 with #1 because the type may have escaped via decltype. So my approach is to elide #2 from the list, so when looking for a matching variant, we always find #1 (#2 remains live though, which admittedly sounds sort of dodgy). PR c++/101715 gcc/cp/ChangeLog: * tree.cc (fixup_deferred_exception_variants): Remove duplicate variants after parsing the exception specifications. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept72.C: New test. * g++.dg/cpp0x/noexcept73.C: New test.
nstester
pushed a commit
that referenced
this pull request
Feb 22, 2022
…04617] On #define A(n) int foo1##n(void) { return 1##n; } #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9) E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325) B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642) testcase with ./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c -ffunction-sections ./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o /tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized (testcase too slow to be included into testsuite). The problem is clearly reported by readelf: readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323 readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section. because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be used instead and .symtab_shndx section should contain the real section index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >= SHN_LORESERVE value is needed it should put those into Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can contain any section index. Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before 2011) used to mishandle the > 63.75K sections case and assumed there is a hole in between the sections, but what simple_object_elf_copy_lto_debug_sections does wouldn't help in that case for the debug temp object creation, we'd need to detect the case also in that routine and take it into account in the remapping etc. I think it is not worth it given that it is over 10 years, if somebody needs 63.75K or more sections, better use more recent binutils. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR lto/104617 * simple-object-elf.c (simple_object_elf_match): Fix up URL in comment. (simple_object_elf_copy_lto_debug_sections): Remap sh_info and sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE range (inclusive).
nstester
pushed a commit
that referenced
this pull request
Mar 15, 2022
Testing cc1 on pr93032-mztools-unsigned-char.c Benchmark #1: (without patch) Time (mean ± σ): 338.8 ms ± 13.6 ms [User: 323.2 ms, System: 14.2 ms] Range (min … max): 326.7 ms … 363.1 ms 10 runs Benchmark #2: (with patch) Time (mean ± σ): 332.3 ms ± 12.8 ms [User: 316.6 ms, System: 14.3 ms] Range (min … max): 322.5 ms … 357.4 ms 10 runs Summary ./cc1.new ran 1.02 ± 0.06 times faster than ./cc1.old gcc/analyzer/ChangeLog: * store.cc (store::store): Presize m_cluster_map. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
nstester
pushed a commit
that referenced
this pull request
Apr 13, 2022
DR 2352 changed the definitions of reference-related (so that it uses "similar type" instead of "same type") and of reference-compatible (use a standard conversion sequence). That means that reference-related is now more broad, which means that we will be binding more things directly. The original patch for DR 2352 caused some problems, which were fixed in r276251 by creating a "fake" ck_qual in direct_reference_binding, so that in void f(int *); // #1 void f(const int * const &); // #2 int *x; int main() { f(x); // call #1 } we call #1. The extra ck_qual in #2 causes compare_ics to select #1, which is a better match for "int *" because then we don't have to do a qualification conversion. Let's turn to the problem in this PR. We have void f(const int * const &); // #1 void f(const int *); // #2 int *x; int main() { f(x); } We arrive in compare_ics to decide which one is better. The ICS for #1 looks like ck_ref_bind <- ck_qual <- ck_identity const int *const & const int *const int * and the ICS for #2 is ck_qual <- ck_rvalue <- ck_identity const int * int * int * We strip the reference and then comp_cv_qual_signature when comparing two ck_quals sees that "const int *" is a proper subset of "const int *const" and we return -1. But that's wrong; presumably the top-level "const" should be ignored and the call should be ambiguous. This patch adjust the type of the "fake" ck_qual so that this problem doesn't arise. PR c++/97296 gcc/cp/ChangeLog: * call.cc (direct_reference_binding): strip_top_quals when creating a ck_qual. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/ref-bind4.C: Add dg-error. * g++.dg/cpp0x/ref-bind8.C: New test.
nstester
pushed a commit
that referenced
this pull request
Apr 28, 2022
Here ever since r12-6022-gbb2a7f80a98de3 we stopped deeming the partial specialization #2 to be more specialized than #1 ultimately because dependent operator expressions now have a DEPENDENT_OPERATOR_TYPE type instead of an empty type, and this made unify stop deducing T(2) == 1 for K during partial ordering for #1 and #2. This minimal patch fixes this by making the relevant logic in unify treat DEPENDENT_OPERATOR_TYPE like an empty type. PR c++/105425 gcc/cp/ChangeLog: * pt.cc (unify) <case TEMPLATE_PARM_INDEX>: Treat DEPENDENT_OPERATOR_TYPE like an empty type. gcc/testsuite/ChangeLog: * g++.dg/template/partial-specialization13.C: New test.
nstester
pushed a commit
that referenced
this pull request
May 5, 2022
This patch fixes the second half of 64679. Here we issue a wrong "redefinition of 'int x'" for the following: struct Bar { Bar(int, int, int); }; int x = 1; Bar bar(int(x), int(x), int{x}); // #1 cp_parser_parameter_declaration_list does pushdecl every time it sees a named parameter, so the second "int(x)" causes the error. That's premature, since this turns out to be a constructor call after the third argument! If the first parameter is parenthesized, we can't push until we've established we're looking at a function declaration. Therefore this could be fixed by some kind of lookahead. I thought about introducing a lightweight variant of cp_parser_parameter_declaration_list that would not have any side effects and would return as soon as it figures out whether it's looking at a declaration or expression. Since that would require fairly nontrivial changes, I wanted something simpler. Something like delaying the pushdecl until we've reached the ')' following the parameter-declaration-clause. But we must push the parameters before processing a default argument, as in: Bar bar(int(a), int(b), int c = sizeof(a)); // valid Moreover, this code should still be accepted Bar f(int(i), decltype(i) j = 42); so this patch stashes parameters into a vector when parsing tentatively only when pushdecl-ing a parameter would result in a clash and an error about redefinition/redeclaration. The stashed parameters are pushed at the end of a parameter-declaration-clause if it's followed by a ')', so that we still diagnose redefining a parameter. PR c++/64679 gcc/cp/ChangeLog: * parser.cc (cp_parser_parameter_declaration_clause): Maintain a vector of parameters that haven't been pushed yet. Push them at the end of a valid parameter-declaration-clause. (cp_parser_parameter_declaration_list): Take a new auto_vec parameter. Do not pushdecl while parsing tentatively when pushdecl-ing a parameter would result in a hard error. (cp_parser_cache_defarg): Adjust the call to cp_parser_parameter_declaration_list. gcc/testsuite/ChangeLog: * g++.dg/parse/ambig11.C: New test. * g++.dg/parse/ambig12.C: New test. * g++.dg/parse/ambig13.C: New test. * g++.dg/parse/ambig14.C: New test.
nstester
pushed a commit
that referenced
this pull request
May 25, 2022
Consider struct A { int x; int y = x; }; struct B { int x = 0; int y = A{x}.y; // #1 }; where for #1 we end up with {.x=(&<PLACEHOLDER_EXPR struct B>)->x, .y=(&<PLACEHOLDER_EXPR struct A>)->x} that is, two PLACEHOLDER_EXPRs for different types on the same level in a {}. This crashes because our CONSTRUCTOR_PLACEHOLDER_BOUNDARY mechanism to avoid replacing unrelated PLACEHOLDER_EXPRs cannot deal with it. Here's why we wound up with those PLACEHOLDER_EXPRs: When we're performing cp_parser_late_parsing_nsdmi for "int y = A{x}.y;" we use finish_compound_literal on type=A, compound_literal={((struct B *) this)->x}. When digesting this initializer, we call get_nsdmi which creates a PLACEHOLDER_EXPR for A -- we don't have any object to refer to yet. After digesting, we have {.x=((struct B *) this)->x, .y=(&<PLACEHOLDER_EXPR struct A>)->x} and since we've created a PLACEHOLDER_EXPR inside it, we marked the whole ctor CONSTRUCTOR_PLACEHOLDER_BOUNDARY. f_c_l creates a TARGET_EXPR and returns TARGET_EXPR <D.2384, {.x=((struct B *) this)->x, .y=(&<PLACEHOLDER_EXPR struct A>)->x}> Then we get to B b = {}; and call store_init_value, which digests the {}, which produces {.x=NON_LVALUE_EXPR <0>, .y=(TARGET_EXPR <D.2395, {.x=(&<PLACEHOLDER_EXPR struct B>)->x, .y=(&<PLACEHOLDER_EXPR struct A>)->x}>).y} lookup_placeholder in constexpr won't find an object to replace the PLACEHOLDER_EXPR for B, because ctx->object will be D.2395 of type A, and we cannot search outward from D.2395 to find 'b'. The call to replace_placeholders in store_init_value will not do anything: we've marked the inner { } CONSTRUCTOR_PLACEHOLDER_BOUNDARY, and it's only a sub-expression, so replace_placeholders does nothing, so the <P_E struct B> stays even though now is the perfect time to replace it because we have an object for it: 'b'. Later, in cp_gimplify_init_expr the *expr_p is D.2395 = {.x=(&<PLACEHOLDER_EXPR struct B>)->x, .y=(&<PLACEHOLDER_EXPR struct A>)->x} where D.2395 is of type A, but we crash because we hit <P_E struct B>, which has a different type. My idea was to replace <P_E struct A> with D.2384 after creating the TARGET_EXPR because that means we have an object we can refer to. Then clear CONSTRUCTOR_PLACEHOLDER_BOUNDARY because we no longer have a PLACEHOLDER_EXPR in the {}. Then store_init_value will be able to replace <P_E struct B> with 'b', and we should be good to go. We must be careful not to break guaranteed copy elision, so this replacement happens in digest_nsdmi_init where we can see the whole initializer, and avoid replacing any placeholders in TARGET_EXPRs used in the context of initialization/copy elision. This is achieved via the new function called potential_prvalue_result_of. While fixing this problem, I found PR105550, thus the FIXMEs in the tests. PR c++/100252 gcc/cp/ChangeLog: * typeck2.cc (potential_prvalue_result_of): New. (replace_placeholders_for_class_temp_r): New. (digest_nsdmi_init): Call it. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/nsdmi-aggr14.C: New test. * g++.dg/cpp1y/nsdmi-aggr15.C: New test. * g++.dg/cpp1y/nsdmi-aggr16.C: New test. * g++.dg/cpp1y/nsdmi-aggr17.C: New test. * g++.dg/cpp1y/nsdmi-aggr18.C: New test. * g++.dg/cpp1y/nsdmi-aggr19.C: New test.
nstester
pushed a commit
that referenced
this pull request
Jun 3, 2022
As explained in r11-4959-gde6f64f9556ae3, the atom cache assumes two equivalent expressions (according to cp_tree_equal) must use the same template parameters (according to find_template_parameters). This assumption turned out to not hold for TARGET_EXPR, which was addressed by that commit. But this assumption apparently doesn't hold for PARM_DECL either: find_template_parameters walks its DECL_CONTEXT but cp_tree_equal by default doesn't consider DECL_CONTEXT unless comparing_specializations is set. Thus in the first testcase below, the atomic constraints of #1 and #2 are equivalent according to cp_tree_equal, but according to find_template_parameters the former uses T and the latter uses both T and U (surprisingly). We could fix this assumption violation by setting comparing_specializations in the atom_hasher, which would make cp_tree_equal return false for the two atoms, but that seems overly pessimistic here. Ideally the atoms should continue being considered equivalent and we instead fix find_template_paremeters to return just T for #2's atom. To that end this patch makes for_each_template_parm_r stop walking the DECL_CONTEXT of a PARM_DECL. This should be safe to do because tsubst_copy / tsubst_decl only substitutes the TREE_TYPE of a PARM_DECL and doesn't bother substituting the DECL_CONTEXT, thus the only relevant template parameters are those used in its type. any_template_parm_r is currently responsible for walking its TREE_TYPE, but I suppose it now makes sense for for_each_template_parm_r to do so instead. In passing this patch also makes for_each_template_parm_r stop walking the DECL_CONTEXT of a VAR_/FUNCTION_DECL since doing so after walking DECL_TI_ARGS is redundant, I think. I experimented with not walking DECL_CONTEXT for CONST_DECL, but the second testcase below demonstrates it's necessary to walk it. PR c++/105797 gcc/cp/ChangeLog: * pt.cc (for_each_template_parm_r) <case FUNCTION_DECL, VAR_DECL>: Don't walk DECL_CONTEXT. <case PARM_DECL>: Likewise. Walk TREE_TYPE. <case CONST_DECL>: Simplify. (any_template_parm_r) <case PARM_DECL>: Don't walk TREE_TYPE. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-decltype4.C: New test. * g++.dg/cpp2a/concepts-memfun3.C: New test.
nstester
pushed a commit
that referenced
this pull request
Jun 15, 2022
(In reply to Uroš Bizjak from comment #1) > Instruction does not accept memory operand for operand 3: > > (define_insn_and_split > "*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_ltint" > [(set (match_operand:<ssebytemode> 0 "register_operand" "=Yr,*x,x") > (unspec:<ssebytemode> > [(match_operand:<ssebytemode> 1 "register_operand" "0,0,x") > (match_operand:<ssebytemode> 2 "vector_operand" "YrBm,*xBm,xm") > (subreg:<ssebytemode> > (lt:VI48_AVX > (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x") > (match_operand:VI48_AVX 4 "const0_operand")) 0)] > UNSPEC_BLENDV))] > > The problematic insn is: > > (define_insn_and_split "*avx_cmp<mode>3_ltint_not" > [(set (match_operand:VI48_AVX 0 "register_operand") > (vec_merge:VI48_AVX > (match_operand:VI48_AVX 1 "vector_operand") > (match_operand:VI48_AVX 2 "vector_operand") > (unspec:<avx512fmaskmode> > [(subreg:VI48_AVX > (not:<ssebytemode> > (match_operand:<ssebytemode> 3 "vector_operand")) 0) > (match_operand:VI48_AVX 4 "const0_operand") > (match_operand:SI 5 "const_0_to_7_operand")] > UNSPEC_PCMP)))] > > which gets split to the above pattern. > > In the preparation statements we have: > > if (!MEM_P (operands[3])) > operands[3] = force_reg (<ssebytemode>mode, operands[3]); > operands[3] = lowpart_subreg (<MODE>mode, operands[3], <ssebytemode>mode); > > Which won't fly when operand 3 is memory operand... > gcc/ChangeLog: PR target/105953 * config/i386/sse.md (*avx_cmp<mode>3_ltint_not): Force_reg operands[3]. gcc/testsuite/ChangeLog: * g++.target/i386/pr105953.C: New test.
nstester
pushed a commit
that referenced
this pull request
Jul 15, 2022
This patch implements C++23 P2255R2, which adds two new type traits to detect reference binding to a temporary. They can be used to detect code like std::tuple<const std::string&> t("meow"); which is incorrect because it always creates a dangling reference, because the std::string temporary is created inside the selected constructor of std::tuple, and not outside it. There are two new compiler builtins, __reference_constructs_from_temporary and __reference_converts_from_temporary. The former is used to simulate direct- and the latter copy-initialization context. But I had a hard time finding a test where there's actually a difference. Under DR 2267, both of these are invalid: struct A { } a; struct B { explicit B(const A&); }; const B &b1{a}; const B &b2(a); so I had to peruse [over.match.ref], and eventually realized that the difference can be seen here: struct G { operator int(); // #1 explicit operator int&&(); // #2 }; int&& r1(G{}); // use #2 (no temporary) int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&) The implementation itself was rather straightforward because we already have the conv_binds_ref_to_prvalue function. The main function here is ref_xes_from_temporary. I've changed the return type of ref_conv_binds_directly to tristate, because previously the function didn't distinguish between an invalid conversion and one that binds to a prvalue. Since it no longer returns a bool, I removed the _p suffix. The patch also adds the relevant class and variable templates to <type_traits>. PR c++/104477 gcc/c-family/ChangeLog: * c-common.cc (c_common_reswords): Add __reference_constructs_from_temporary and __reference_converts_from_temporary. * c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY. gcc/cp/ChangeLog: * call.cc (ref_conv_binds_directly_p): Rename to ... (ref_conv_binds_directly): ... this. Add a new bool parameter. Change the return type to tristate. * constraint.cc (diagnose_trait_expr): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. * cp-tree.h: Include "tristate.h". (enum cp_trait_kind): Add CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. (ref_conv_binds_directly_p): Rename to ... (ref_conv_binds_directly): ... this. (ref_xes_from_temporary): Declare. * cxx-pretty-print.cc (pp_cxx_trait_expression): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. * method.cc (ref_xes_from_temporary): New. * parser.cc (cp_parser_primary_expression): Handle RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY. (cp_parser_trait_expr): Likewise. (warn_for_range_copy): Adjust to call ref_conv_binds_directly. * semantics.cc (trait_expr_value): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. (finish_trait_expr): Likewise. libstdc++-v3/ChangeLog: * include/std/type_traits (reference_constructs_from_temporary, reference_converts_from_temporary): New class templates. (reference_constructs_from_temporary_v, reference_converts_from_temporary_v): New variable templates. (__cpp_lib_reference_from_temporary): Define for C++23. * include/std/version (__cpp_lib_reference_from_temporary): Define for C++23. * testsuite/20_util/variable_templates_for_traits.cc: Test reference_constructs_from_temporary_v and reference_converts_from_temporary_v. * testsuite/20_util/reference_from_temporary/value.cc: New test. * testsuite/20_util/reference_from_temporary/value2.cc: New test. * testsuite/20_util/reference_from_temporary/version.cc: New test. gcc/testsuite/ChangeLog: * g++.dg/ext/reference_constructs_from_temporary1.C: New test. * g++.dg/ext/reference_converts_from_temporary1.C: New test.
nstester
pushed a commit
that referenced
this pull request
Aug 17, 2022
In my previous patches I've been extending our std::move warnings, but this tweak actually dials it down a little bit. As reported in bug 89780, it's questionable to warn about expressions in templates that were type-dependent, but aren't anymore because we're instantiating the template. As in, template <typename T> Dest withMove() { T x; return std::move(x); } template Dest withMove<Dest>(); // #1 template Dest withMove<Source>(); // #2 Saying that the std::move is pessimizing for #1 is not incorrect, but it's not useful, because removing the std::move would then pessimize #2. So the user can't really win. At the same time, disabling the warning just because we're in a template would be going too far, I still want to warn for template <typename> Dest withMove() { Dest x; return std::move(x); } because the std::move therein will be pessimizing for any instantiation. So I'm using the suppress_warning machinery to that effect. Problem: I had to add a new group to nowarn_spec_t, otherwise suppressing the -Wpessimizing-move warning would disable a whole bunch of other warnings, which we really don't want. PR c++/89780 gcc/cp/ChangeLog: * pt.cc (tsubst_copy_and_build) <case CALL_EXPR>: Maybe suppress -Wpessimizing-move. * typeck.cc (maybe_warn_pessimizing_move): Don't issue warnings if they are suppressed. (check_return_expr): Disable -Wpessimizing-move when returning a dependent expression. gcc/ChangeLog: * diagnostic-spec.cc (nowarn_spec_t::nowarn_spec_t): Handle OPT_Wpessimizing_move and OPT_Wredundant_move. * diagnostic-spec.h (nowarn_spec_t): Add NW_REDUNDANT enumerator. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/Wpessimizing-move3.C: Remove dg-warning. * g++.dg/cpp0x/Wredundant-move2.C: Likewise.
nstester
pushed a commit
that referenced
this pull request
Sep 23, 2022
To improve compile times, the C++ library could use compiler built-ins rather than implementing std::is_convertible (and _nothrow) as class templates. This patch adds the built-ins. We already have __is_constructible and __is_assignable, and the nothrow forms of those. Microsoft (and clang, for compatibility) also provide an alias called __is_convertible_to. I did not add it, but it would be trivial to do so. I noticed that our __is_assignable doesn't implement the "Access checks are performed as if from a context unrelated to either type" requirement, therefore std::is_assignable / __is_assignable give two different results here: class S { operator int(); friend void g(); // #1 }; void g () { // #1 doesn't matter static_assert(std::is_assignable<int&, S>::value, ""); static_assert(__is_assignable(int&, S), ""); } This is not a problem if __is_assignable is not meant to be used by the users. This patch doesn't make libstdc++ use the new built-ins, but I had to rename a class otherwise its name would clash with the new built-in. PR c++/106784 gcc/c-family/ChangeLog: * c-common.cc (c_common_reswords): Add __is_convertible and __is_nothrow_convertible. * c-common.h (enum rid): Add RID_IS_CONVERTIBLE and RID_IS_NOTHROW_CONVERTIBLE. gcc/cp/ChangeLog: * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE. * cp-objcp-common.cc (names_builtin_p): Handle RID_IS_CONVERTIBLE RID_IS_NOTHROW_CONVERTIBLE. * cp-tree.h (enum cp_trait_kind): Add CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE. (is_convertible): Declare. (is_nothrow_convertible): Likewise. * cxx-pretty-print.cc (pp_cxx_trait_expression): Handle CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE. * method.cc (is_convertible): New. (is_nothrow_convertible): Likewise. * parser.cc (cp_parser_primary_expression): Handle RID_IS_CONVERTIBLE and RID_IS_NOTHROW_CONVERTIBLE. (cp_parser_trait_expr): Likewise. * semantics.cc (trait_expr_value): Handle CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE. (finish_trait_expr): Likewise. libstdc++-v3/ChangeLog: * include/std/type_traits: Rename __is_nothrow_convertible to __is_nothrow_convertible_lib. * testsuite/20_util/is_nothrow_convertible/value_ext.cc: Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/has-builtin-1.C: Enhance to test __is_convertible and __is_nothrow_convertible. * g++.dg/ext/is_convertible1.C: New test. * g++.dg/ext/is_convertible2.C: New test. * g++.dg/ext/is_nothrow_convertible1.C: New test. * g++.dg/ext/is_nothrow_convertible2.C: New test.
nstester
pushed a commit
that referenced
this pull request
Nov 14, 2022
In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. This patch adds an named function to allow us to emit an optimized sequence when doing an unsigned division that is equivalent to: x = y / (2 ^ (bitsize (y)/2)-1) For SVE2 this means we generate for: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] = (pixel[i] * level) / 0xff; } the following: mov z3.b, #1 .L3: ld1b z0.h, p0/z, [x0, x3] mul z0.h, p1/m, z0.h, z2.h addhnb z1.b, z0.h, z3.h addhnb z0.b, z0.h, z1.h st1b z0.h, p0, [x0, x3] inch x3 whilelo p0.h, w3, w2 b.any .L3 instead of: .L3: ld1b z0.h, p1/z, [x0, x3] mul z0.h, p0/m, z0.h, z1.h umulh z0.h, p0/m, z0.h, z2.h lsr z0.h, z0.h, gcc-mirror#7 st1b z0.h, p1, [x0, x3] inch x3 whilelo p1.h, w3, w2 b.any .L3 Which results in significantly faster code. gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv<mode>3): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test.
nstester
pushed a commit
that referenced
this pull request
Nov 14, 2022
A friend declaration can only have constraints if it is defined. If multiple instantiations of a class template define the same friend function signature, it's an error, but that shouldn't happen if it's constrained to only be declared in one instantiation. Currently we don't mangle requirements, so the foos all mangle the same and actually instantiating #1 will break, but for now we can test that they're considered distinct. gcc/cp/ChangeLog: * pt.cc (tsubst_friend_function): Check satisfaction. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-friend11.C: New test.
nstester
pushed a commit
that referenced
this pull request
Jan 12, 2023
While looking at PR 105549, which is about fixing the ABI break introduced in GCC 9.1 in parameter alignment with bit-fields, we noticed that the GCC 9.1 warning is not emitted in all the cases where it should be. This patch fixes that and the next patch in the series fixes the GCC 9.1 break. We split this into two patches since patch #2 introduces a new ABI break starting with GCC 13.1. This way, patch #1 can be back-ported to release branches if needed to fix the GCC 9.1 warning issue. The main idea is to add a new global boolean that indicates whether we're expanding the start of a function, so that aarch64_layout_arg can emit warnings for callees as well as callers. This removes the need for aarch64_function_arg_boundary to warn (with its incomplete information). However, in the first patch there are still cases where we emit warnings were we should not; this is fixed in patch #2 where we can distinguish between GCC 9.1 and GCC.13.1 ABI breaks properly. The fix in aarch64_function_arg_boundary (replacing & with &&) looks like an oversight of a previous commit in this area which changed 'abi_break' from a boolean to an integer. We also take the opportunity to fix the comment above aarch64_function_arg_alignment since the value of the abi_break parameter was changed in a previous commit, no longer matching the description. 2022-11-28 Christophe Lyon <christophe.lyon@arm.com> Richard Sandiford <richard.sandiford@arm.com> gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Fix comment. (aarch64_layout_arg): Factorize warning conditions. (aarch64_function_arg_boundary): Fix typo. * function.cc (currently_expanding_function_start): New variable. (expand_function_start): Handle currently_expanding_function_start. * function.h (currently_expanding_function_start): Declare. gcc/testsuite/ChangeLog: * gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align8-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning.h: New test. * g++.target/aarch64/bitfield-abi-warning-align16-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning-align16-O2-extra.C: New test. * g++.target/aarch64/bitfield-abi-warning-align32-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning-align32-O2-extra.C: New test. * g++.target/aarch64/bitfield-abi-warning-align8-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning.h: New test.
nstester
pushed a commit
that referenced
this pull request
Jan 16, 2023
This recognises the patterns of the form: while (n & 1) { n >>= 1 } Unfortunately there are currently two issues relating to this patch. Firstly, simplify_using_initial_conditions does not recognise that (n != 0) and ((n & 1) == 0) implies that ((n >> 1) != 0). This preconditions arise following the loop copy-header pass, and the assumptions returned by number_of_iterations_exit_assumptions then prevent final value replacement from using the niter result. I'm not sure what is the best way to fix this - one approach could be to modify simplify_using_initial_conditions to handle this sort of case, but it seems that it basically wants the information that ranger could give anway, so would something like that be a better option? The second issue arises in the vectoriser, which is able to determine that the niter->assumptions are always true. When building with -march=armv8.4-a+sve -S -O3, we get this codegen: foo (unsigned int b) { int c = 0; if (b == 0) return PREC; while (!(b & (1 << (PREC - 1)))) { b <<= 1; c++; } return c; } foo: .LFB0: .cfi_startproc cmp w0, 0 cbz w0, .L6 blt .L7 lsl w1, w0, 1 clz w2, w1 cmp w2, 14 bls .L8 mov x0, 0 cntw x3 add w1, w2, 1 index z1.s, #0, #1 whilelo p0.s, wzr, w1 .L4: add x0, x0, x3 mov p1.b, p0.b mov z0.d, z1.d whilelo p0.s, w0, w1 incw z1.s b.any .L4 add z0.s, z0.s, #1 lastb w0, p1, z0.s ret .p2align 2,,3 .L8: mov w0, 0 b .L3 .p2align 2,,3 .L13: lsl w1, w1, 1 .L3: add w0, w0, 1 tbz w1, gcc-mirror#31, .L13 ret .p2align 2,,3 .L6: mov w0, 32 ret .p2align 2,,3 .L7: mov w0, 0 ret .cfi_endproc In essence, the vectoriser uses the niter information to determine exactly how many iterations of the loop it needs to run. It then uses SVE whilelo instructions to run this number of iterations. The original loop counter is also vectorised, despite only being used in the final iteration, and then the final value of this counter is used as the return value (which is the same as the number of iterations it computed in the first place). This vectorisation is obviously bad, and I think it exposes a latent bug in the vectoriser, rather than being an issue caused by this specific patch. gcc/ChangeLog: * tree-ssa-loop-niter.cc (number_of_iterations_cltz): New. (number_of_iterations_bitcount): Add call to the above. (number_of_iterations_exit_assumptions): Add EQ_EXPR case for c[lt]z idiom recognition. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/cltz-max.c: New test. * gcc.dg/tree-ssa/clz-char.c: New test. * gcc.dg/tree-ssa/clz-int.c: New test. * gcc.dg/tree-ssa/clz-long-long.c: New test. * gcc.dg/tree-ssa/clz-long.c: New test. * gcc.dg/tree-ssa/ctz-char.c: New test. * gcc.dg/tree-ssa/ctz-int.c: New test. * gcc.dg/tree-ssa/ctz-long-long.c: New test. * gcc.dg/tree-ssa/ctz-long.c: New test.
nstester
pushed a commit
that referenced
this pull request
Feb 3, 2023
Here the ahead-of-time overload set pruning in finish_call_expr is unintentionally returning a CALL_EXPR whose (pruned) callee is wrapped in an ADDR_EXPR, despite the original callee not being wrapped in an ADDR_EXPR. This ends up causing a bogus declaration mismatch error in the below testcase because the call to min in #1 gets expressed as a CALL_EXPR of ADDR_EXPR of FUNCTION_DECL, whereas the level-lowered call to min in #2 gets expressed instead as a CALL_EXPR of FUNCTION_DECL. This patch fixes this by stripping the spurious ADDR_EXPR appropriately. Thus the first call to min now also gets expressed as a CALL_EXPR of FUNCTION_DECL, matching the behavior before r12-6075-g2decd2cabe5a4f. PR c++/107461 gcc/cp/ChangeLog: * semantics.cc (finish_call_expr): Strip ADDR_EXPR from the selected callee during overload set pruning. gcc/testsuite/ChangeLog: * g++.dg/template/call9.C: New test.
nstester
pushed a commit
that referenced
this pull request
Feb 6, 2023
After r13-5684-g59e0376f607805 the (pruned) callee of a non-dependent CALL_EXPR is a bare FUNCTION_DECL rather than ADDR_EXPR of FUNCTION_DECL. This innocent change revealed that cp_tree_equal doesn't first check dependence of a CALL_EXPR before treating a FUNCTION_DECL callee as a dependent name, which leads to us incorrectly accepting the first two testcases below and rejecting the third: * In the first testcase, cp_tree_equal incorrectly returns true for the two non-dependent CALL_EXPRs f(0) and f(0) (whose CALL_EXPR_FN are different FUNCTION_DECLs) which causes us to treat #2 as a redeclaration of #1. * Same issue in the second testcase, for f<int*>() and f<char>(). * In the third testcase, cp_tree_equal incorrectly returns true for f<int>() and f<void(*)(int)>() which causes us to conflate the two dependent specializations A<decltype(f<int>()(U()))> and A<decltype(f<void(*)(int)>()(U()))>. This patch fixes this by making called_fns_equal treat two callees as dependent names only if the overall CALL_EXPRs are dependent, via a new convenience function call_expr_dependent_name that is like dependent_name but also checks dependence of the overall CALL_EXPR. PR c++/107461 gcc/cp/ChangeLog: * cp-tree.h (call_expr_dependent_name): Declare. * pt.cc (iterative_hash_template_arg) <case CALL_EXPR>: Use call_expr_dependent_name instead of dependent_name. * tree.cc (call_expr_dependent_name): Define. (called_fns_equal): Adjust to take two CALL_EXPRs instead of CALL_EXPR_FNs thereof. Use call_expr_dependent_name instead of dependent_name. (cp_tree_equal) <case CALL_EXPR>: Adjust call to called_fns_equal. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/overload5.C: New test. * g++.dg/cpp0x/overload5a.C: New test. * g++.dg/cpp0x/overload6.C: New test.
nstester
pushed a commit
that referenced
this pull request
Mar 2, 2023
-Wmismatched-tags warns about the (harmless) struct/class mismatch. For, e.g., template<typename T> struct A { }; class A<int> a; it works by adding A<T> to the class2loc hash table while parsing the class-head and then, while parsing the elaborate type-specifier, we add A<int>. At the end of c_parse_file we go through the table and warn about the class-key mismatches. In this PR we crash though; we have template<typename T> struct A { template<typename U> struct W { }; }; struct A<int>::W<int> w; // #1 where while parsing A and #1 we've stashed A<T> A<T>::W<U> A<int>::W<int> into class2loc. Then in class_decl_loc_t::diag_mismatched_tags TYPE is A<int>::W<int>, and specialization_of gets us A<int>::W<U>, which is not in class2loc, so we crash on gcc_assert (cdlguide). But it's OK not to have found A<int>::W<U>, we should just look one "level" up, that is, A<T>::W<U>. It's important to handle class specializations, so e.g. template<> struct A<char> { template<typename U> class W { }; }; where W's class-key is different than in the primary template above, so we should warn depending on whether we're looking into A<char> or into a different instantiation. PR c++/106259 gcc/cp/ChangeLog: * parser.cc (class_decl_loc_t::diag_mismatched_tags): If the first lookup of SPEC didn't find anything, try to look for most_general_template. gcc/testsuite/ChangeLog: * g++.dg/warn/Wmismatched-tags-11.C: New test.
nstester
pushed a commit
that referenced
this pull request
Apr 23, 2023
Currently on xstormy16 SImode shifts by a single bit require two instructions, and shifts by other non-zero integer immediate constants require five instructions. This patch implements the obvious optimization that shifts by two bits can be done in four instructions, by using two single-bit sequences. Hence, ashift_2 was previously generated as: mov r7,r2 | shl r2,#2 | shl r3,#2 | shr r7,gcc-mirror#14 | or r3,r7 ret and with this patch we now generate: shl r2,#1 | rlc r3,#1 | shl r2,#1 | rlc r3,#1 ret 2023-04-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.cc (xstormy16_output_shift): Implement SImode shifts by two by performing a single bit SImode shift twice. gcc/testsuite/ChangeLog * gcc.target/xstormy16/shiftsi.c: New test case.
nstester
pushed a commit
that referenced
this pull request
Apr 29, 2023
This patch contains some minor tweak to xstormy16's machine description most significantly providing a pattern for HImode rotate left by a single bit that requires only two instructions. unsigned short foo(unsigned short x) { return (x << 1) | (x >> 15); } currently with -O2 generates: foo: mov r7,r2 shr r7,gcc-mirror#15 shl r2,#1 or r2,r7 ret with this patch, GCC now generates: foo: shl r2,#1 | adc r2,#0 ret Additionally neghi2 is converted to a define_insn (so that the RTL optimizers see the negation semantics), and HImode rotations by 8-bits can now be recognized and implemented using swpb. 2023-04-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.md (neghi2): Convert from a define_expand to a define_insn. (*rotatehi_1): New define_insn for efficient 2 insn sequence. (*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb. gcc/testsuite/ChangeLog * gcc.target/xstormy16/neghi2.c: New test case. * gcc.target/xstormy16/rotatehi-1.c: Likewise.
nstester
pushed a commit
that referenced
this pull request
May 2, 2023
I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X##1; \ template <class U> struct X##2; \ template <class U> struct X##3; \ template <class U> struct X##4; \ template <class U> struct X#gcc-mirror#5; \ template <class U> struct X#gcc-mirror#6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
nstester
pushed a commit
that referenced
this pull request
May 18, 2023
Hi all, We noticed that calls to the vadcq and vsbcq intrinsics, both of which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in the FPSCR, would produce the following code: ``` < r2 is the *carry input > vmrs r3, FPSCR_nzcvqc bic r3, r3, #536870912 orr r3, r3, r2, lsl gcc-mirror#29 vmsr FPSCR_nzcvqc, r3 ``` when the MVE ACLE instead gives a different instruction sequence of: ``` < Rt is the *carry input > VMRS Rs,FPSCR_nzcvqc BFI Rs,Rt,gcc-mirror#29,#1 VMSR FPSCR_nzcvqc,Rs ``` the bic + orr pair is slower and it's also wrong, because, if the *carry input is greater than 1, then we risk overwriting the top two bits of the FPSCR register (the N and Z flags). This turned out to be a problem in the header file and the solution was to simply add a `& 1x0u` to the `*carry` input: then the compiler knows that we only care about the lowest bit and can optimise to a BFI. Ok for trunk? Thanks, Stam Markianos-Wright gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic. (__arm_vadcq_u32): Likewise. (__arm_vadcq_m_s32): Likewise. (__arm_vadcq_m_u32): Likewise. (__arm_vsbcq_s32): Likewise. (__arm_vsbcq_u32): Likewise. (__arm_vsbcq_m_s32): Likewise. (__arm_vsbcq_m_u32): Likewise. * config/arm/mve.md (get_fpscr_nzcvqc): Make unspec_volatile. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: New.
nstester
pushed a commit
that referenced
this pull request
May 23, 2023
This patch decreses one machine instruction from "single bit extraction with shifting" operation, and tries to eliminate the conditional branch if CST2_POW2 doesn't fit into signed 12 bits with the help of ifcvt optimization. /* example #1 */ int test0(int x) { return (x & 1048576) != 0 ? 1024 : 0; } extern int foo(void); int test1(void) { return (foo() & 1048576) != 0 ? 16777216 : 0; } ;; before test0: movi a9, 0x400 srai a2, a2, 10 and a2, a2, a9 ret.n test1: addi sp, sp, -16 s32i.n a0, sp, 12 call0 foo extui a2, a2, 20, 1 slli a2, a2, 20 beqz.n a2, .L2 movi.n a2, 1 slli a2, a2, 24 .L2: l32i.n a0, sp, 12 addi sp, sp, 16 ret.n ;; after test0: extui a2, a2, 20, 1 slli a2, a2, 10 ret.n test1: addi sp, sp, -16 s32i.n a0, sp, 12 call0 foo l32i.n a0, sp, 12 extui a2, a2, 20, 1 slli a2, a2, 24 addi sp, sp, 16 ret.n In addition, if the left shift amount ('exact_log2(CST2_POW2)') is between 1 through 3 and a either addition or subtraction with another register follows, emit a ADDX[248] or SUBX[248] machine instruction instead of separate left shift and add/subtract ones. /* example #2 */ int test2(int x, int y) { return ((x & 1048576) != 0 ? 4 : 0) + y; } int test3(int x, int y) { return ((x & 2) != 0 ? 8 : 0) - y; } ;; before test2: movi.n a9, 4 srai a2, a2, 18 and a2, a2, a9 add.n a2, a2, a3 ret.n test3: movi.n a9, 8 slli a2, a2, 2 and a2, a2, a9 sub a2, a2, a3 ret.n ;; after test2: extui a2, a2, 20, 1 addx4 a2, a2, a3 ret.n test3: extui a2, a2, 1, 1 subx8 a2, a2, a3 ret.n gcc/ChangeLog: * config/xtensa/predicates.md (addsub_operator): New. * config/xtensa/xtensa.md (*extzvsi-1bit_ashlsi3, *extzvsi-1bit_addsubx): New insn_and_split patterns. * config/xtensa/xtensa.cc (xtensa_rtx_costs): Add a special case about ifcvt 'noce_try_cmove()' to handle constant loads that do not fit into signed 12 bits in the patterns added above.
nstester
pushed a commit
that referenced
this pull request
Jul 25, 2023
This code in cxx_eval_array_reference has been hard to get right. In r12-2304 I added some code; in r13-5693 I removed some of it. Here the problematic line is "S s = arr[0];" which causes a crash on the assert in verify_ctor_sanity: gcc_assert (!ctx->object || !DECL_P (ctx->object) || ctx->global->get_value (ctx->object) == ctx->ctor); ctx->object is the VAR_DECL 's', which is correct here. The second line points to the problem: we replaced ctx->ctor in cxx_eval_array_reference: new_ctx.ctor = build_constructor (elem_type, NULL); // #1 which I think we shouldn't have; the CONSTRUCTOR we created in cxx_eval_constant_expression/DECL_EXPR new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL); had the right type. We still need #1 though. E.g., in constexpr-96241.C, we never set ctx.ctor/object before calling cxx_eval_array_reference, so we have to build a CONSTRUCTOR there. And in constexpr-101371-2.C we have a ctx.ctor, but it has the wrong type, so we need a new one. We can fix the problem by always clearing the object, and, as an optimization, only create/free a new ctor when actually needed. PR c++/110382 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_array_reference): Create a new constructor only when we don't already have a matching one. Clear the object when the type is non-scalar. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/constexpr-110382.C: New test.
nstester
pushed a commit
that referenced
this pull request
Oct 26, 2023
This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A #4, R12 { RRAM.A #4, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
nstester
pushed a commit
that referenced
this pull request
Dec 11, 2023
Since the last import from upstream libsanitizer, the output has changed and now looks more like this: READ of size 6 at 0x7ff7beb2a144 thread T0 #0 0x101cf7796 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) sanitizer_common_interceptors.inc:813 #1 0x101cf7b99 in memcmp sanitizer_common_interceptors.inc:840 #2 0x108a0c39f in __stack_chk_guard+0xf (dyld:x86_64+0x8039f) so let's adjust the pattern accordingly. gcc/testsuite/ChangeLog: * c-c++-common/asan/memcmp-1.c: Adjust pattern on darwin.
nstester
pushed a commit
that referenced
this pull request
Dec 19, 2023
During partial ordering, we want to look through dependent alias template specializations within template arguments and otherwise treat them as opaque in other contexts (see e.g. r7-7116-g0c942f3edab108 and r11-7011-g6e0a231a4aa240). To that end template_args_equal was given a partial_order flag that controls this behavior. This flag does the right thing when a dependent alias template specialization appears as template argument of the partial specialization, e.g. in template<class T, class...> using first_t = T; template<class T> struct traits; template<class T> struct traits<first_t<T, T&>> { }; // #1 template<class T> struct traits<first_t<const T, T&>> { }; // #2 we correctly consider #2 to be more specialized than #1. But if the alias specialization appears as a nested template argument of another class template specialization, e.g. in template<class T> struct traits<A<first_t<T, T&>>> { }; // #1 template<class T> struct traits<A<first_t<const T, T&>>> { }; // #2 then we incorrectly consider #1 and #2 to be unordered. This is because 1. we don't propagate the flag to recursive template_args_equal calls 2. we don't use structural equality for class template specializations written in terms of dependent alias template specializations This patch fixes the first issue by turning the partial_order flag into a global. This patch fixes the second issue by making us propagate structural equality appropriately when building a class template specialization. In passing this patch also improves hashing of specializations that use structural equality. PR c++/90679 gcc/cp/ChangeLog: * cp-tree.h (comp_template_args): Remove partial_order parameter. (template_args_equal): Likewise. * pt.cc (comparing_for_partial_ordering): New global flag. (iterative_hash_template_arg) <case tcc_type>: Hash the template and arguments for specializations that use structural equality. (template_args_equal): Remove partial order parameter and use comparing_for_partial_ordering instead. (comp_template_args): Likewise. (comp_template_args_porder): Set comparing_for_partial_ordering instead. Make static. (any_template_arguments_need_structural_equality_p): Return true for an argument that's a dependent alias template specialization or a class template specialization that itself needs structural equality. * tree.cc (cp_tree_equal) <case TREE_VEC>: Adjust call to comp_template_args. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/alias-decl-75a.C: New test. * g++.dg/cpp0x/alias-decl-75b.C: New test.
nstester
pushed a commit
that referenced
this pull request
Dec 24, 2023
Hi All, This patch adds initial support for early break vectorization in GCC. In other words it implements support for vectorization of loops with multiple exits. The support is added for any target that implements a vector cbranch optab, this includes both fully masked and non-masked targets. Depending on the operation, the vectorizer may also require support for boolean mask reductions using Inclusive OR/Bitwise AND. This is however only checked then the comparison would produce multiple statements. This also fully decouples the vectorizer's notion of exit from the existing loop infrastructure's exit. Before this patch the vectorizer always picked the natural loop latch connected exit as the main exit. After this patch the vectorizer is free to choose any exit it deems appropriate as the main exit. This means that even if the main exit is not countable (i.e. the termination condition could not be determined) we might still be able to vectorize should one of the other exits be countable. In such situations the loop is reflowed which enabled vectorization of many other loop forms. Concretely the kind of loops supported are of the forms: for (int i = 0; i < N; i++) { <statements1> if (<condition>) { ... <action>; } <statements2> } where <action> can be: - break - return - goto Any number of statements can be used before the <action> occurs. Since this is an initial version for GCC 14 it has the following limitations and features: - Only fixed sized iterations and buffers are supported. That is to say any vectors loaded or stored must be to statically allocated arrays with known sizes. N must also be known. This limitation is because our primary target for this optimization is SVE. For VLA SVE we can't easily do cross page iteraion checks. The result is likely to also not be beneficial. For that reason we punt support for variable buffers till we have First-Faulting support in GCC 15. - any stores in <statements1> should not be to the same objects as in <condition>. Loads are fine as long as they don't have the possibility to alias. More concretely, we block RAW dependencies when the intermediate value can't be separated fromt the store, or the store itself can't be moved. - Prologue peeling, alignment peelinig and loop versioning are supported. - Fully masked loops, unmasked loops and partially masked loops are supported - Any number of loop early exits are supported. - No support for epilogue vectorization. The only epilogue supported is the scalar final one. Peeling code supports it but the code motion code cannot find instructions to make the move in the epilog. - Early breaks are only supported for inner loop vectorization. With the help of IPA and LTO this still gets hit quite often. During bootstrap it hit rather frequently. Additionally TSVC s332, s481 and s482 all pass now since these are tests for support for early exit vectorization. This implementation does not support completely handling the early break inside the vector loop itself but instead supports adding checks such that if we know that we have to exit in the current iteration then we branch to scalar code to actually do the final VF iterations which handles all the code in <action>. For the scalar loop we know that whatever exit you take you have to perform at most VF iterations. For vector code we only case about the state of fully performed iteration and reset the scalar code to the (partially) remaining loop. That is to say, the first vector loop executes so long as the early exit isn't needed. Once the exit is taken, the scalar code will perform at most VF extra iterations. The exact number depending on peeling and iteration start and which exit was taken (natural or early). For this scalar loop, all early exits are treated the same. When we vectorize we move any statement not related to the early break itself and that would be incorrect to execute before the break (i.e. has side effects) to after the break. If this is not possible we decline to vectorize. The analysis and code motion also takes into account that it doesn't introduce a RAW dependency after the move of the stores. This means that we check at the start of iterations whether we are going to exit or not. During the analyis phase we check whether we are allowed to do this moving of statements. Also note that we only move the scalar statements, but only do so after peeling but just before we start transforming statements. With this the vector flow no longer necessarily needs to match that of the scalar code. In addition most of the infrastructure is in place to support general control flow safely, however we are punting this to GCC 15. Codegen: for e.g. unsigned vect_a[N]; unsigned vect_b[N]; unsigned test4(unsigned x) { unsigned ret = 0; for (int i = 0; i < N; i++) { vect_b[i] = x + i; if (vect_a[i] > x) break; vect_a[i] = x; } return ret; } We generate for Adv. SIMD: test4: adrp x2, .LC0 adrp x3, .LANCHOR0 dup v2.4s, w0 add x3, x3, :lo12:.LANCHOR0 movi v4.4s, 0x4 add x4, x3, 3216 ldr q1, [x2, #:lo12:.LC0] mov x1, 0 mov w2, 0 .p2align 3,,7 .L3: ldr q0, [x3, x1] add v3.4s, v1.4s, v2.4s add v1.4s, v1.4s, v4.4s cmhi v0.4s, v0.4s, v2.4s umaxp v0.4s, v0.4s, v0.4s fmov x5, d0 cbnz x5, .L6 add w2, w2, 1 str q3, [x1, x4] str q2, [x3, x1] add x1, x1, 16 cmp w2, 200 bne .L3 mov w7, 3 .L2: lsl w2, w2, 2 add x5, x3, 3216 add w6, w2, w0 sxtw x4, w2 ldr w1, [x3, x4, lsl 2] str w6, [x5, x4, lsl 2] cmp w0, w1 bcc .L4 add w1, w2, 1 str w0, [x3, x4, lsl 2] add w6, w1, w0 sxtw x1, w1 ldr w4, [x3, x1, lsl 2] str w6, [x5, x1, lsl 2] cmp w0, w4 bcc .L4 add w4, w2, 2 str w0, [x3, x1, lsl 2] sxtw x1, w4 add w6, w1, w0 ldr w4, [x3, x1, lsl 2] str w6, [x5, x1, lsl 2] cmp w0, w4 bcc .L4 str w0, [x3, x1, lsl 2] add w2, w2, 3 cmp w7, 3 beq .L4 sxtw x1, w2 add w2, w2, w0 ldr w4, [x3, x1, lsl 2] str w2, [x5, x1, lsl 2] cmp w0, w4 bcc .L4 str w0, [x3, x1, lsl 2] .L4: mov w0, 0 ret .p2align 2,,3 .L6: mov w7, 4 b .L2 and for SVE: test4: adrp x2, .LANCHOR0 add x2, x2, :lo12:.LANCHOR0 add x5, x2, 3216 mov x3, 0 mov w1, 0 cntw x4 mov z1.s, w0 index z0.s, #0, #1 ptrue p1.b, all ptrue p0.s, all .p2align 3,,7 .L3: ld1w z2.s, p1/z, [x2, x3, lsl 2] add z3.s, z0.s, z1.s cmplo p2.s, p0/z, z1.s, z2.s b.any .L2 st1w z3.s, p1, [x5, x3, lsl 2] add w1, w1, 1 st1w z1.s, p1, [x2, x3, lsl 2] add x3, x3, x4 incw z0.s cmp w3, 803 bls .L3 .L5: mov w0, 0 ret .p2align 2,,3 .L2: cntw x5 mul w1, w1, w5 cbz w5, .L5 sxtw x1, w1 sub w5, w5, #1 add x5, x5, x1 add x6, x2, 3216 b .L6 .p2align 2,,3 .L14: str w0, [x2, x1, lsl 2] cmp x1, x5 beq .L5 mov x1, x4 .L6: ldr w3, [x2, x1, lsl 2] add w4, w0, w1 str w4, [x6, x1, lsl 2] add x4, x1, 1 cmp w0, w3 bcs .L14 mov w0, 0 ret On the workloads this work is based on we see between 2-3x performance uplift using this patch. Follow up plan: - Boolean vectorization has several shortcomings. I've filed PR110223 with the bigger ones that cause vectorization to fail with this patch. - SLP support. This is planned for GCC 15 as for majority of the cases build SLP itself fails. This means I'll need to spend time in making this more robust first. Additionally it requires: * Adding support for vectorizing CFG (gconds) * Support for CFG to differ between vector and scalar loops. Both of which would be disruptive to the tree and I suspect I'll be handling fallouts from this patch for a while. So I plan to work on the surrounding building blocks first for the remainder of the year. Additionally it also contains reduced cases from issues found running over various codebases. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Also regtested with: -march=armv8.3-a+sve -march=armv8.3-a+nosve -march=armv9-a -mcpu=neoverse-v1 -mcpu=neoverse-n2 Bootstrapped Regtested x86_64-pc-linux-gnu and no issues. Bootstrap and Regtest on arm-none-linux-gnueabihf and no issues. gcc/ChangeLog: * tree-if-conv.cc (idx_within_array_bound): Expose. * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New. (vect_analyze_data_ref_dependences): Use it. * tree-vect-loop-manip.cc (vect_iv_increment_position): New. (vect_set_loop_controls_directly, vect_set_loop_condition_partial_vectors, vect_set_loop_condition_partial_vectors_avx512, vect_set_loop_condition_normal): Support multiple exits. (slpeel_tree_duplicate_loop_to_edge_cfg): Support LCSAA peeling for multiple exits. (slpeel_can_duplicate_loop_p): Change vectorizer from looking at BB count and instead look at loop shape. (vect_update_ivs_after_vectorizer): Drop asserts. (vect_gen_vector_loop_niters_mult_vf): Support peeled vector iterations. (vect_do_peeling): Support multiple exits. (vect_loop_versioning): Likewise. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialise early_breaks. (vect_analyze_loop_form): Support loop flows with more than single BB loop body. (vect_create_loop_vinfo): Support niters analysis for multiple exits. (vect_analyze_loop): Likewise. (vect_get_vect_def): New. (vect_create_epilog_for_reduction): Support early exit reductions. (vectorizable_live_operation_1): New. (find_connected_edge): New. (vectorizable_live_operation): Support early exit live operations. (move_early_exit_stmts): New. (vect_transform_loop): Use it. * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gcond. (vect_recog_bitfield_ref_pattern): Support gconds and bools. (vect_recog_gcond_pattern): New. (possible_vector_mask_operation_p): Support gcond masks. (vect_determine_mask_precision): Likewise. (vect_mark_pattern_stmts): Set gcond def type. (can_vectorize_live_stmts): Force early break inductions to be live. * tree-vect-stmts.cc (vect_stmt_relevant_p): Add relevancy analysis for early breaks. (vect_mark_stmts_to_be_vectorized): Process gcond usage. (perm_mask_for_reverse): Expose. (vectorizable_comparison_1): New. (vectorizable_early_exit): New. (vect_analyze_stmt): Support early break and gcond. (vect_transform_stmt): Likewise. (vect_is_simple_use): Likewise. (vect_get_vector_types_for_stmt): Likewise. * tree-vectorizer.cc (pass_vectorize::execute): Update exits for value numbering. * tree-vectorizer.h (enum vect_def_type): Add vect_condition_def. (LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_EARLY_BRK_STORES, LOOP_VINFO_EARLY_BREAKS_VECT_PEELED, LOOP_VINFO_EARLY_BRK_DEST_BB, LOOP_VINFO_EARLY_BRK_VUSES): New. (is_loop_header_bb_p): Drop assert. (class loop): Add early_breaks, early_break_stores, early_break_dest_bb, early_break_vuses. (vect_iv_increment_position, perm_mask_for_reverse, ref_within_array_bound): New. (slpeel_tree_duplicate_loop_to_edge_cfg): Update for early breaks.
sys-ceuplift
pushed a commit
that referenced
this pull request
Feb 4, 2024
This patch adjusts the costs so that we treat REG and SUBREG expressions the same for costing. This was motivated by bt_skip_func and bt_find_func in xz and results in nearly a 5% improvement in the dynamic instruction count for input #2 and smaller, but definitely visible improvements pretty much across the board. Exceptions would be perlbench input #1 and exchange2 which showed very small regressions. In the bt_find_func and bt_skip_func cases we have something like this: > (insn 10 7 11 2 (set (reg/v:DI 136 [ x ]) > (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip} > (nil)) > (insn 11 10 12 2 (set (reg:DI 142 [ _1 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3} > (nil)) [ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3} > (nil)) Note the two uses of (reg 136). The best way to handle that in combine might be a 3->2 split. But there's a much better approach if we look at fwprop... (set (reg:DI 142 [ _1 ]) (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0)) (reg/v:DI 139 [ b ]))) change not profitable (cost 4 -> cost 8) So that should be the same cost as a regular DImode addition when the ZBA extension is enabled. But it ends up costing more because the clause to cost this variant isn't prepared to handle a SUBREG. That results in the RTL above having too high a cost and fwprop gives up. One approach would be to replace the REG_P with REG_P || SUBREG_P in the costing code. I ultimately decided against that and instead check if the operand in question passes register_operand. By far the most important case to handle is the DImode PLUS. But for the sake of consistency, I changed the other instances in riscv_rtx_costs as well. For those other cases we're talking about improvements in the .000001% range. While we are into stage4, this just hits cost modeling which we've generally agreed is still appropriate (though we were mostly talking about vector). So I'm going to extend that general agreement ever so slightly and include scalar cost modeling :-) gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG similarly. gcc/testsuite/ * gcc.target/riscv/reg_subreg_costs.c: New test. Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
sys-ceuplift
pushed a commit
that referenced
this pull request
Sep 7, 2024
…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case.
sys-ceuplift
pushed a commit
that referenced
this pull request
Sep 16, 2024
On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift instructions like SHL have a throughput of 2. We can lean on that to emit code like: add z31.b, z31.b, z31.b instead of: lsl z31.b, z31.b, #1 The implementation of this change for SVE vectors is similar to a prior patch <https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659958.html> that adds the above functionality for Neon vectors. Here, the machine descriptor pattern is split up to separately accommodate left and right shifts, so we can specifically emit an add for all left shifts by 1. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*post_ra_v<optab><mode>3): Split pattern to accomodate left and right shifts separately. (*post_ra_v_ashl<mode>3): Matches left shifts with additional constraint to check for shifts by 1. (*post_ra_v_<optab><mode>3): Matches right shifts. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of lsl-1 with corresponding add. * gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise. * gcc.target/aarch64/sve/adr_1.c: Likewise. * gcc.target/aarch64/sve/adr_6.c: Likewise. * gcc.target/aarch64/sve/cond_mla_7.c: Likewise. * gcc.target/aarch64/sve/cond_mla_8.c: Likewise. * gcc.target/aarch64/sve/shift_2.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise. * gcc.target/aarch64/sve/sve_shl_add.c: New test.
sys-ceuplift
pushed a commit
that referenced
this pull request
Sep 16, 2024
…on [PR113328] SVE's INDEX instruction can be used to populate vectors by values starting from "base" and incremented by "step" for each subsequent value. We can take advantage of it to generate vector constants if TARGET_SVE is available and the base and step values are within [-16, 15]. For example, with the following function: typedef int v4si __attribute__ ((vector_size (16))); v4si f_v4si (void) { return (v4si){ 0, 1, 2, 3 }; } GCC currently generates: f_v4si: adrp x0, .LC4 ldr q0, [x0, #:lo12:.LC4] ret .LC4: .word 0 .word 1 .word 2 .word 3 With this patch, we generate an INDEX instruction instead if TARGET_SVE is available. f_v4si: index z0.s, #0, #1 ret PR target/113328 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_simd_valid_immediate): Improve handling of some ADVSIMD vectors by using SVE's INDEX if TARGET_SVE is available. (aarch64_output_simd_mov_immediate): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use SVE's INDEX instruction. * gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise. * gcc.target/aarch64/sve/vec_init_3.c: New test. Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
sys-ceuplift
pushed a commit
that referenced
this pull request
Oct 18, 2024
Implement vddup and vidup using the new MVE builtins framework. We generate better code because we take advantage of the two outputs produced by the v[id]dup instructions. For instance, before: ldr r3, [r0] sub r2, r3, gcc-mirror#8 str r2, [r0] mov r2, r3 vddup.u16 q3, r2, #1 now: ldr r2, [r0] vddup.u16 q3, r2, #1 str r2, [r0] 2024-08-21 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm-mve-builtins-base.cc (class viddup_impl): New. (vddup): New. (vidup): New. * config/arm/arm-mve-builtins-base.def (vddupq): New. (vidupq): New. * config/arm/arm-mve-builtins-base.h (vddupq): New. (vidupq): New. * config/arm/arm_mve.h (vddupq_m): Delete. (vddupq_u8): Delete. (vddupq_u32): Delete. (vddupq_u16): Delete. (vidupq_m): Delete. (vidupq_u8): Delete. (vidupq_u32): Delete. (vidupq_u16): Delete. (vddupq_x_u8): Delete. (vddupq_x_u16): Delete. (vddupq_x_u32): Delete. (vidupq_x_u8): Delete. (vidupq_x_u16): Delete. (vidupq_x_u32): Delete. (vddupq_m_n_u8): Delete. (vddupq_m_n_u32): Delete. (vddupq_m_n_u16): Delete. (vddupq_m_wb_u8): Delete. (vddupq_m_wb_u16): Delete. (vddupq_m_wb_u32): Delete. (vddupq_n_u8): Delete. (vddupq_n_u32): Delete. (vddupq_n_u16): Delete. (vddupq_wb_u8): Delete. (vddupq_wb_u16): Delete. (vddupq_wb_u32): Delete. (vidupq_m_n_u8): Delete. (vidupq_m_n_u32): Delete. (vidupq_m_n_u16): Delete. (vidupq_m_wb_u8): Delete. (vidupq_m_wb_u16): Delete. (vidupq_m_wb_u32): Delete. (vidupq_n_u8): Delete. (vidupq_n_u32): Delete. (vidupq_n_u16): Delete. (vidupq_wb_u8): Delete. (vidupq_wb_u16): Delete. (vidupq_wb_u32): Delete. (vddupq_x_n_u8): Delete. (vddupq_x_n_u16): Delete. (vddupq_x_n_u32): Delete. (vddupq_x_wb_u8): Delete. (vddupq_x_wb_u16): Delete. (vddupq_x_wb_u32): Delete. (vidupq_x_n_u8): Delete. (vidupq_x_n_u16): Delete. (vidupq_x_n_u32): Delete. (vidupq_x_wb_u8): Delete. (vidupq_x_wb_u16): Delete. (vidupq_x_wb_u32): Delete. (__arm_vddupq_m_n_u8): Delete. (__arm_vddupq_m_n_u32): Delete. (__arm_vddupq_m_n_u16): Delete. (__arm_vddupq_m_wb_u8): Delete. (__arm_vddupq_m_wb_u16): Delete. (__arm_vddupq_m_wb_u32): Delete. (__arm_vddupq_n_u8): Delete. (__arm_vddupq_n_u32): Delete. (__arm_vddupq_n_u16): Delete. (__arm_vidupq_m_n_u8): Delete. (__arm_vidupq_m_n_u32): Delete. (__arm_vidupq_m_n_u16): Delete. (__arm_vidupq_n_u8): Delete. (__arm_vidupq_m_wb_u8): Delete. (__arm_vidupq_m_wb_u16): Delete. (__arm_vidupq_m_wb_u32): Delete. (__arm_vidupq_n_u32): Delete. (__arm_vidupq_n_u16): Delete. (__arm_vidupq_wb_u8): Delete. (__arm_vidupq_wb_u16): Delete. (__arm_vidupq_wb_u32): Delete. (__arm_vddupq_wb_u8): Delete. (__arm_vddupq_wb_u16): Delete. (__arm_vddupq_wb_u32): Delete. (__arm_vddupq_x_n_u8): Delete. (__arm_vddupq_x_n_u16): Delete. (__arm_vddupq_x_n_u32): Delete. (__arm_vddupq_x_wb_u8): Delete. (__arm_vddupq_x_wb_u16): Delete. (__arm_vddupq_x_wb_u32): Delete. (__arm_vidupq_x_n_u8): Delete. (__arm_vidupq_x_n_u16): Delete. (__arm_vidupq_x_n_u32): Delete. (__arm_vidupq_x_wb_u8): Delete. (__arm_vidupq_x_wb_u16): Delete. (__arm_vidupq_x_wb_u32): Delete. (__arm_vddupq_m): Delete. (__arm_vddupq_u8): Delete. (__arm_vddupq_u32): Delete. (__arm_vddupq_u16): Delete. (__arm_vidupq_m): Delete. (__arm_vidupq_u8): Delete. (__arm_vidupq_u32): Delete. (__arm_vidupq_u16): Delete. (__arm_vddupq_x_u8): Delete. (__arm_vddupq_x_u16): Delete. (__arm_vddupq_x_u32): Delete. (__arm_vidupq_x_u8): Delete. (__arm_vidupq_x_u16): Delete. (__arm_vidupq_x_u32): Delete.
sys-ceuplift
pushed a commit
that referenced
this pull request
Oct 22, 2024
gcc.dg/torture/pr112305.c contains an inner loop that executes 0x8000_0014 times and an outer loop that executes 5 times, giving about 10 billion total executions of the inner loop body. At -O2 and above we are able to remove the inner loop, but at -O1 we keep a no-op loop: dls lr, r3 .L3: subs r3, r3, #1 le lr, .L3 and at -O0 we of course don't optimise. This can lead to long execution times on simulators, possibly triggering a timeout. gcc/testsuite * gcc.dg/torture/pr112305.c: Skip at -O0 and -O1 for simulators.
sys-ceuplift
pushed a commit
that referenced
this pull request
Nov 5, 2024
We currently crash upon the following invalid code (notice the "void void**" parameter) === cut here === using size_t = decltype(sizeof(int)); void *operator new(size_t, void void **p) noexcept { return p; } int x; void f() { int y; new (&y) int(x); } === cut here === The problem is that in this case, we end up with a NULL_TREE parameter list for the new operator because of the error, and (1) coerce_new_type wrongly complains about the first parameter type not being size_t, (2) std_placement_new_fn_p blindly accesses the parameter list, hence a crash. This patch does NOT address #1 since we can't easily distinguish between a new operator declaration without parameters from one with erroneous parameters (and it's not worth the risk to refactor and break things for an error recovery issue) hence a dg-bogus in new52.C, but it does address #2 and the ICE by simply checking the first parameter against NULL_TREE. It also adds a new testcase checking that we complain about new operators with no or invalid first parameters, since we did not have any. PR c++/117101 gcc/cp/ChangeLog: * init.cc (std_placement_new_fn_p): Check first_arg against NULL_TREE. gcc/testsuite/ChangeLog: * g++.dg/init/new52.C: New test. * g++.dg/init/new53.C: New test.
sys-ceuplift
pushed a commit
that referenced
this pull request
Nov 8, 2024
Update test case for armv8.1-m.main that supports conditional arithmetic. armv7-m: push {r4, lr} ldr r4, .L6 ldr r4, [r4] lsls r4, r4, gcc-mirror#29 it mi addmi r2, r2, #1 bl bar movs r0, #0 pop {r4, pc} armv8.1-m.main: push {r3, r4, r5, lr} ldr r4, .L5 ldr r5, [r4] tst r5, #4 csinc r2, r2, r2, eq bl bar movs r0, #0 pop {r3, r4, r5, pc} gcc/testsuite/ChangeLog: * gcc.target/arm/epilog-1.c: Use check-function-bodies. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
sys-ceuplift
pushed a commit
that referenced
this pull request
Nov 11, 2024
The second source register of insn "*extzvsi-1bit_addsubx" cannot be the same as the destination register, because that register will be overwritten with an intermediate value after insn splitting. /* example #1 */ int test1(int b, int a) { return ((a & 1024) ? 4 : 0) + b; } ;; result #1 (incorrect) test1: extui a2, a3, 10, 1 ;; overwrites A2 before used addx4 a2, a2, a2 ret.n This patch fixes that. ;; result #1 (correct) test1: extui a3, a3, 10, 1 ;; uses A3 and then overwrites addx4 a2, a3, a2 ret.n However, it should be noted that the first source register can be the same as the destination without any problems. /* example #2 */ int test2(int a, int b) { return ((a & 1024) ? 4 : 0) + b; } ;; result (correct) test2: extui a2, a2, 10, 1 ;; uses A2 and then overwrites addx4 a2, a2, a3 ret.n gcc/ChangeLog: * config/xtensa/xtensa.md (*extzvsi-1bit_addsubx): Add '&' to the destination register constraint to indicate that it is 'earlyclobber', append '0' to the first source register constraint to indicate that it can be the same as the destination register, and change the split condition from 1 to reload_completed so that the insn will be split only after RA in order to obtain allocated registers that satisfy the above constraints.
sys-ceuplift
pushed a commit
that referenced
this pull request
Nov 26, 2024
vec.h has this method: template<typename T, typename A> inline T * vec_safe_push (vec<T, A, vl_embed> *&v, const T &obj CXX_MEM_STAT_INFO) where v is a reference to a pointer to vec. This matches the regex for VecPrinter, so gdbhooks.py attempts to print it but chokes on the reference. I see the following: #1 0x0000000002b84b7b in vec_safe_push<edge_def*, va_gc> (v=Traceback (most recent call last): File "$SRC/gcc/gcc/gdbhooks.py", line 486, in to_string return '0x%x' % intptr(self.gdbval) File "$SRC/gcc/gcc/gdbhooks.py", line 168, in intptr return long(gdbval) if sys.version_info.major == 2 else int(gdbval) gdb.error: Cannot convert value to long. This patch makes VecPrinter handle such references by stripping them (dereferencing) at the top of the relevant functions. gcc/ChangeLog: * gdbhooks.py (strip_ref): New. Use it ... (VecPrinter.to_string): ... here, (VecPrinter.children): ... and here.
sys-ceuplift
pushed a commit
that referenced
this pull request
Dec 7, 2024
Brief: The bug appears in LRA after rematerialization pass while creating live ranges. File lra.cc: ************************************************************* /* Now we know what pseudos should be spilled. Try to rematerialize them first. */ if (lra_remat ()) { /* We need full live info -- see the comment above. */ lra_create_live_ranges (lra_reg_spill_p, true); ************************************************************* Wrong call `lra_create_live_ranges (lra_reg_spill_p, true)' It have to be `lra_create_live_ranges (true, true)'. The explanation: ********************************** int main (void) { if (a.u33 * a.u33 != 0) ------^^^^^^^^^^^^^ goto abrt; if (a.u33 * a.u40 * a.u33 != 0) ********************************** The bug appears here. Part of the expression `a.u33 * a.u33' Before LRA: ************************************************************* (insn 13 11 15 2 (set (reg:QI 184 [ _1+3 ]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 15 13 16 2 (set (reg:QI 64 [ a+4 ]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 16 15 20 2 (set (reg:QI 185 [ _1+4 ]) (zero_extract:QI (reg:QI 64 [ a+4 ]) (const_int 1 [0x1]) (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split} (nil)) ************************************************************* After LRA: ************************************************************* (insn 587 11 13 2 (set (reg:QI 24 r24 [368]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 13 587 15 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 1 [0x1])) [4 %sfp+1 S1 A8]) (reg:QI 24 r24 [368])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 15 13 16 2 (set (reg:QI 6 r6 [orig:64 a+4 ] [64]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 16 15 572 2 (set (reg:QI 24 r24 [orig:185 _1+4 ] [185]) (zero_extract:QI (reg:QI 6 r6 [orig:64 a+4 ] [64]) (const_int 1 [0x1]) (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split} (nil)) (insn 572 16 20 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 1 [0x1])) [4 %sfp+1 S1 A8]) (reg:QI 24 r24 [orig:185 _1+4 ] [185])) "bf.c":11:8 86 {movqi_insn_split} (nil)) ************************************************************* Insn 13 and insn 572 use sfp+1 as a spill slot, but in IRA pass it was a two different pseudos r184 and r185. Insns 13 use sfp+1 as a spill slot for r184 Insns 572 use the same slot for r185. It's wrong. Here we have a rematerialization. Fragment from bf.c.317r.reload: ************************************************************************************** ******** Rematerialization #1: ******** df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 14 ( 1) df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 14 ( 1) Cands: 0 (nop=0, remat_regno=185, reload_regno=359): (insn 16 15 572 2 (set (reg:QI 359 [orig:185 _1+4 ] [185]) (zero_extract:QI (reg:QI 64 [ a+4 ]) (const_int 1 [0x1]) (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split} (nil)) ************************************************************************************** [...] ************************************************************************************** Ranges after the compression: r185: [0..1] Frame pointer can not be eliminated anymore Spilling non-eliminable hard regs: 28 29 Spilling r113(28) Spilling r184(29) Spilling r208(29) Spilling r209(28) Slot 0 regnos (width = 0): 185 209 208 184 113 ************************************************************************************** The bug is here: `r185: [0..1]' wrong live range after compression. r185 and r184 can't have the same spill slot ! Rematerialization in bf.c.317r.reload looks like: ************************************************************* 24: r14:QI=r185:QI Inserting rematerialization insn before: 581: r14:QI=zero_extract(r64:QI,0x1,0) deleting insn with uid = 24. Considering alt=0 of insn 16: (0) =r (1) rYil (2) n overall=0,losers=0,rld_nregs=0 32: r22:QI=r185:QI Inserting rematerialization insn before: 582: r22:QI=zero_extract(r64:QI,0x1,0) deleting insn with uid = 32. ************************************************************* It's happened because: Fragment from lra.c (lra): ************************************************************************* if (! live_p) { /* We need full live info for spilling pseudos into registers instead of memory. */ lra_create_live_ranges (lra_reg_spill_p, true); live_p = true; } /* We should check necessity for spilling here as the above live range pass can remove spilled pseudos. */ if (! lra_need_for_spills_p ()) break; /* Now we know what pseudos should be spilled. Try to rematerialize them first. */ if (lra_remat ()) { /* We need full live info -- see the comment above. */ lra_create_live_ranges (lra_reg_spill_p, true); ----------------------------------^^^^^^^^^^^^^^^ live_p = true; ************************************************************************* The bug is here. Rematerialization sometimes can be like spilling pseudos into registers. 582: r22:QI=zero_extract(r64:QI,0x1,0) So, here we need a live ranges for all pseudos. PS: the patch will not affect any target with usable definition of TARGET_SPILL_CLASS hook. PR target/116778 gcc/ * lra-lives.cc (complete_info_p): Clarification of the comment. * lra.cc (lra): Create a full live info after rematerialization.
sys-ceuplift
pushed a commit
that referenced
this pull request
Dec 9, 2024
This PR reports a missed optimization. When we have: Str str{"Test"}; callback(str); as in the test, we're able to evaluate the Str::Str() call at compile time. But when we have: callback(Str{"Test"}); we are not. With this patch (in fact, it's Patrick's patch with a little tweak), we turn callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr 5 __ct_comp D.2890 (struct Str *) <<< Unknown tree: void_cst >>> (const char *) "Test" >>>>) into callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>) I explored the idea of calling maybe_constant_value for the whole TARGET_EXPR in cp_fold. That has three problems: - we can't always elide a TARGET_EXPR, so we'd have to make sure the result is also a TARGET_EXPR; - the resulting TARGET_EXPR must have the same flags, otherwise Bad Things happen; - getting a new slot is also problematic. I've seen a test where we had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer D.2680, we can't replace it with D.2681, and things break. With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C. FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant store: .*.a = {}" is easy. Previously, we would call C::C, so .gimple has: D.2590 = {}; C::C (&D.2590); D.2597 = D.2590; return D.2597; Then .einline inlines the C::C call: D.2590 = {}; D.2590.a = {}; // #1 D.2590.b = 0; // #2 D.2597 = D.2590; D.2590 ={v} {CLOBBER(eos)}; return D.2597; then #2 is removed in .fre1, and #1 is removed in .dse1. So the test passes. But with the patch, .gimple won't have that C::C call, so the IL is of course going to look different. The .optimized dump looks the same though so there's no problem. pr78687.C is XFAILed because the test passes with r15-5746 but not with r15-5747 as well. I opened <https://gcc.gnu.org/PR117971>. PR c++/116416 gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold TARGET_EXPR_INITIAL and replace it with the folded result if it's TREE_CONSTANT. gcc/testsuite/ChangeLog: * g++.dg/analyzer/pr97116.C: Adjust dg-message. * g++.dg/tree-ssa/pr78687.C: Add XFAIL. * g++.dg/tree-ssa/pr90883.C: Adjust dg-final. * g++.dg/cpp0x/constexpr-prvalue1.C: New test. * g++.dg/cpp1y/constexpr-prvalue1.C: New test. Co-authored-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>
sys-ceuplift
pushed a commit
that referenced
this pull request
Dec 12, 2024
With the changes in r15-1579-g792f97b44ff, the code used as "padding" in the test case is optimized way. Prevent this optimization by forcing a read of the volatile memory. Also, validate that there is a far jump in the generated assembler. Without this patch, the generated assembler is reduced to: f3: cmp r0, #0 beq .L1 ldr r4, .L6 .L1: bx lr .L7: .align 2 .L6: .word g_0_1 With the patch, the generated assembler is: f3: movs r2, #1 ldr r3, .L6 push {lr} str r2, [r3] cmp r0, #0 bne .LCB10 bl .L1 @far jump .LCB10: b .L7 .L8: .align 2 .L6: .word .LANCHOR0 .L7: str r2, [r3] ... str r2, [r3] .L1: pop {pc} gcc/testsuite/ChangeLog: * gcc.target/arm/thumb1-far-jump-2.c: Write to volatile memmory in macro to avoid optimization. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
sys-ceuplift
pushed a commit
that referenced
this pull request
Dec 12, 2024
On Cortex-M4, the code generated is: cmp r0, r1 itte ne lslne r0, r0, r1 asrne r0, r0, #1 moveq r0, r1 add r0, r0, r1 bx lr On Cortex-M7, the code generated is: cmp r0, r1 beq .L3 lsls r0, r0, r1 asrs r0, r0, #1 add r0, r0, r1 bx lr .L3: mov r0, r1 add r0, r0, r1 bx lr As Cortex-M7 only allow maximum one conditional instruction, force Cortex-M4 to have a stable test case. gcc/testsuite/ChangeLog: * gcc.target/arm/thumb-ifcvt.c: Use -mtune=cortex-m4. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
sys-ceuplift
pushed a commit
that referenced
this pull request
Dec 17, 2024
This crash started with my r12-7803 but I believe the problem lies elsewhere. build_vec_init has cleanup_flags whose purpose is -- if I grok this correctly -- to avoid destructing an object multiple times. Let's say we are initializing an array of A. Then we might end up in a scenario similar to initlist-eh1.C: try { call A::A in a loop // #0 try { call a fn using the array } finally { // #1 call A::~A in a loop } } catch { // #2 call A::~A in a loop } cleanup_flags makes us emit a statement like D.3048 = 2; at #0 to disable performing the cleanup at #2, since #1 will take care of the destruction of the array. But if we are not emitting the loop because we can use a constant initializer (and use a single { a, b, ...}), we shouldn't generate the statement resetting the iterator to its initial value. Otherwise we crash in gimplify_var_or_parm_decl because it gets the stray decl D.3048. PR c++/117985 gcc/cp/ChangeLog: * init.cc (build_vec_init): Pop CLEANUP_FLAGS if we're not generating the loop. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist-array23.C: New test. * g++.dg/cpp0x/initlist-array24.C: New test.
sys-ceuplift
pushed a commit
that referenced
this pull request
Jan 7, 2025
This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS tunable and use_new_vector_costs entry in aarch64-tuning-flags.def and makes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend the default. To that end, the function aarch64_use_new_vector_costs_p and its uses were removed. To prevent costing vec_to_scalar operations with 0, as described in https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665481.html, we adjusted vectorizable_store such that the variable n_adjacent_stores also covers vec_to_scalar operations. This way vec_to_scalar operations are not costed individually, but as a group. As suggested by Richard Sandiford, the "known_ne" in the multilane-check was replaced by "maybe_ne" in order to treat nunits==1+1X as a vector rather than a scalar. Two tests were adjusted due to changes in codegen. In both cases, the old code performed loop unrolling once, but the new code does not: Example from gcc.target/aarch64/sve/strided_load_2.c (compiled with -O2 -ftree-vectorize -march=armv8.2-a+sve -mtune=generic -moverride=tune=none): f_int64_t_32: cbz w3, .L92 mov x4, 0 uxtw x3, w3 + cntd x5 + whilelo p7.d, xzr, x3 + mov z29.s, w5 mov z31.s, w2 - whilelo p6.d, xzr, x3 - mov x2, x3 - index z30.s, #0, #1 - uqdecd x2 - ptrue p5.b, all - whilelo p7.d, xzr, x2 + index z30.d, #0, #1 + ptrue p6.b, all .p2align 3,,7 .L94: - ld1d z27.d, p7/z, [x0, #1, mul vl] - ld1d z28.d, p6/z, [x0] - movprfx z29, z31 - mul z29.s, p5/m, z29.s, z30.s - incw x4 - uunpklo z0.d, z29.s - uunpkhi z29.d, z29.s - ld1d z25.d, p6/z, [x1, z0.d, lsl 3] - ld1d z26.d, p7/z, [x1, z29.d, lsl 3] - add z25.d, z28.d, z25.d + ld1d z27.d, p7/z, [x0, x4, lsl 3] + movprfx z28, z31 + mul z28.s, p6/m, z28.s, z30.s + ld1d z26.d, p7/z, [x1, z28.d, uxtw 3] add z26.d, z27.d, z26.d - st1d z26.d, p7, [x0, #1, mul vl] - whilelo p7.d, x4, x2 - st1d z25.d, p6, [x0] - incw z30.s - incb x0, all, mul #2 - whilelo p6.d, x4, x3 + st1d z26.d, p7, [x0, x4, lsl 3] + add z30.s, z30.s, z29.s + incd x4 + whilelo p7.d, x4, x3 b.any .L94 .L92: ret Example from gcc.target/aarch64/sve/strided_store_2.c (compiled with -O2 -ftree-vectorize -march=armv8.2-a+sve -mtune=generic -moverride=tune=none): f_int64_t_32: cbz w3, .L84 - addvl x5, x1, #1 mov x4, 0 uxtw x3, w3 - mov z31.s, w2 + cntd x5 whilelo p7.d, xzr, x3 - mov x2, x3 - index z30.s, #0, #1 - uqdecd x2 - ptrue p5.b, all - whilelo p6.d, xzr, x2 + mov z29.s, w5 + mov z31.s, w2 + index z30.d, #0, #1 + ptrue p6.b, all .p2align 3,,7 .L86: - ld1d z28.d, p7/z, [x1, x4, lsl 3] - ld1d z27.d, p6/z, [x5, x4, lsl 3] - movprfx z29, z30 - mul z29.s, p5/m, z29.s, z31.s - add z28.d, z28.d, #1 - uunpklo z26.d, z29.s - st1d z28.d, p7, [x0, z26.d, lsl 3] - incw x4 - uunpkhi z29.d, z29.s + ld1d z27.d, p7/z, [x1, x4, lsl 3] + movprfx z28, z30 + mul z28.s, p6/m, z28.s, z31.s add z27.d, z27.d, #1 - whilelo p6.d, x4, x2 - st1d z27.d, p7, [x0, z29.d, lsl 3] - incw z30.s + st1d z27.d, p7, [x0, z28.d, uxtw 3] + incd x4 + add z30.s, z30.s, z29.s whilelo p7.d, x4, x3 b.any .L86 .L84: ret The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * tree-vect-stmts.cc (vectorizable_store): Extend the use of n_adjacent_stores to also cover vec_to_scalar operations. * config/aarch64/aarch64-tuning-flags.def: Remove use_new_vector_costs as tuning option. * config/aarch64/aarch64.cc (aarch64_use_new_vector_costs_p): Remove. (aarch64_vector_costs::add_stmt_cost): Remove use of aarch64_use_new_vector_costs_p. (aarch64_vector_costs::finish_cost): Remove use of aarch64_use_new_vector_costs_p. * config/aarch64/tuning_models/cortexx925.h: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS. * config/aarch64/tuning_models/fujitsu_monaka.h: Likewise. * config/aarch64/tuning_models/generic_armv8_a.h: Likewise. * config/aarch64/tuning_models/generic_armv9_a.h: Likewise. * config/aarch64/tuning_models/neoverse512tvb.h: Likewise. * config/aarch64/tuning_models/neoversen2.h: Likewise. * config/aarch64/tuning_models/neoversen3.h: Likewise. * config/aarch64/tuning_models/neoversev1.h: Likewise. * config/aarch64/tuning_models/neoversev2.h: Likewise. * config/aarch64/tuning_models/neoversev3.h: Likewise. * config/aarch64/tuning_models/neoversev3ae.h: Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/strided_load_2.c: Adjust expected outcome. * gcc.target/aarch64/sve/strided_store_2.c: Likewise.
sys-ceuplift
pushed a commit
that referenced
this pull request
Feb 7, 2025
In a member-specification of a class, a noexcept-specifier is a complete-class context. Thus we delay parsing until the end of the class via our DEFERRED_PARSE mechanism; see cp_parser_save_noexcept and cp_parser_late_noexcept_specifier. We also attempt to defer instantiation of noexcept-specifiers in order to reduce the number of instantiations; this is done via DEFERRED_NOEXCEPT. We can even have both, as in noexcept65.C: a DEFERRED_PARSE wrapped in DEFERRED_NOEXCEPT, which uses the DEFPARSE_INSTANTIATIONS mechanism. noexcept65.C works, because when we really need the noexcept, which is when parsing the body of S::A::A(), the noexcept will have been parsed already; noexcepts are parsed before bodies of member function. But in this test we have: struct A { int x; template<class> void foo() noexcept(noexcept(x)) {} auto bar() -> decltype(foo<int>()) {} // #1 }; and I think the decltype in #1 needs the unparsed noexcept before it could have been parsed. clang++ rejects the test and I suppose we should reject it as well, rather than crashing on a DEFERRED_PARSE in tsubst_expr. PR c++/117106 PR c++/118190 gcc/cp/ChangeLog: * pt.cc (maybe_instantiate_noexcept): Give an error if the noexcept hasn't been parsed yet. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept89.C: New test. * g++.dg/cpp0x/noexcept90.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Verified with https://github.com/hanzhan1/gcc/actions/runs/515615757.
Locally verified the build binary with cpu2017ref/505, result passed.