[PATCH] c++: Fall through for arrays of T vs T cv [PR104996] #4

ecatmur · 2022-07-17T21:11:39Z

If two arrays do not have the exact same element type including qualification, this could be e.g. f(int (&&)[]) vs. f(int const (&)[]), which can still be distinguished by the lvalue-rvalue tiebreaker.

By tightening this branch (in accordance with the letter of the Standard) we fall through to the next branch, which tests whether they have different element type ignoring qualification and returns 0 in that case; thus we only actually fall through in the T[...] vs. T cv[...] case, eventually considering the lvalue-rvalue tiebreaker at the end of compare_ics.

Signed-off-by: Ed Catmur ed@catmur.uk

PR c++/104996

gcc/cp/ChangeLog:

* call.cc (compare_ics): When comparing list-initialization sequences, do not return early.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist129.C: New test.

If two arrays do not have the exact same element type including qualification, this could be e.g. f(int (&&)[]) vs. f(int const (&)[]), which can still be distinguished by the lvalue-rvalue tiebreaker. By tightening this branch (in accordance with the letter of the Standard) we fall through to the next branch, which tests whether they have different element type ignoring qualification and returns 0 in that case; thus we only actually fall through in the T[...] vs. T cv[...] case, eventually considering the lvalue-rvalue tiebreaker at the end of compare_ics. Signed-off-by: Ed Catmur <ed@catmur.uk> PR c++/104996 gcc/cp/ChangeLog: * call.cc (compare_ics): When comparing list-initialization sequences, do not return early. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist129.C: New test.

With many thanks to H.J. for doing all the hard work, this patch resolves two P1 regressions; PR target/106933 and PR target/106959. Although superficially similar, the i386 backend's two scalar-to-vector (STV) passes perform their transformations in importantly different ways. The original pass converting SImode and DImode operations to V4SImode or V2DImode operations is "soft", allowing values to be maintained in both integer and vector hard registers. The newer pass converting TImode operations to V1TImode is "hard" (all or nothing) that converts all uses of a pseudo to vector form. To implement this it invokes powerful ju-ju calling SET_MODE on a reg_rtx, which due to RTL sharing, often updates this pseudo's mode everywhere in the RTL chain. Hence, TImode STV can only be performed when all uses of a pseudo are convertible to V1TImode form. To ensure this the STV passes currently use data-flow analysis to inspect all DEFs and USEs in a chain. This works fine for chains that are in the usual single assignment form, but the occurrence of uninitialized variables, or multiple assignments that split a pseudo's usage into several independent chains (lifetimes) can lead to situations where some but not all of a pseudo's occurrences need to be updated. This is safe for the SImode/DImode pass, but leads to the above bugs during the TImode pass. My one minor tweak to HJ's patch from comment #4 of bugzilla PR106959 is to only perform the new single_def_chain_p check for TImode STV; it turns out that STV of SImode/DImode min/max operates safely on multiple-def chains, and prohibiting this leads to testsuite regressions. We don't (yet) support V1TImode min/max, so this idiom isn't an issue during the TImode STV pass. For the record, the two alternate possible fixes are (i) make the TImode STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx with a new pseudo, or (ii) merging "chains" so that multiple DFA chains/lifetimes are considered a single STV chain. 2022-12-23 H.J. Lu <hjl.tools@gmail.com> Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/106933 PR target/106959 * config/i386/i386-features.cc (single_def_chain_p): New predicate function to check that a pseudo's use-def chain is in SSA form. (timode_scalar_to_vector_candidate_p): Check that TImode regs that are SET_DEST or SET_SRC of an insn match/are single_def_chain_p. gcc/testsuite/ChangeLog PR target/106933 PR target/106959 * gcc.target/i386/pr106933-1.c: New test case. * gcc.target/i386/pr106933-2.c: Likewise. * gcc.target/i386/pr106959-1.c: Likewise. * gcc.target/i386/pr106959-2.c: Likewise. * gcc.target/i386/pr106959-3.c: Likewise.

The aarch64 ISA specification allows a left shift amount to be applied after extension in the range of 0 to 4 (encoded in the imm3 field). This is true for at least the following instructions: * ADD (extend register) * ADDS (extended register) * SUB (extended register) The result of this patch can be seen, when compiling the following code: uint64_t myadd(uint64_t a, uint64_t b) { return a+(((uint8_t)b)<<4); } Without the patch the following sequence will be generated: 0000000000000000 <myadd>: 0: d37c1c21 ubfiz x1, x1, #4, gcc-mirror#8 4: 8b000020 add x0, x1, x0 8: d65f03c0 ret With the patch the ubfiz will be merged into the add instruction: 0000000000000000 <myadd>: 0: 8b211000 add x0, x0, w1, uxtb #4 4: d65f03c0 ret gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_uxt_size): fix an off-by-one in checking the permissible shift-amount.

This patch adds support for xstormy16's swap nibbles instruction (swpn). For the test case: short foo(short x) { return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f); } GCC with -O2 currently generates the nine instruction sequence: foo: mov r7,r2 asr r2,#4 and r2,gcc-mirror#15 mov.w r6,#-256 and r6,r7 or r2,r6 shl r7,#4 and r7,#255 or r2,r7 ret with this patch, we now generate: foo: swpn r2 ret To achieve this using combine's four instruction "combinations" requires a little wizardry. Firstly, define_insn_and_split are introduced to treat logical shifts followed by bitwise-AND as macro instructions that are split after reload. This is sufficient to recognize a QImode nibble swap, which can be implemented by swpn followed by either a zero-extension or a sign-extension from QImode to HImode. Then finally, in the correct context, a QImode swap-nibbles pattern can be combined to preserve the high-byte of a HImode word, matching the xstormy16's swpn semantics. The naming of the new code iterators is taken from i386.md. 2023-04-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.md (any_lshift): New code iterator. (any_or_plus): Likewise. (any_rotate): Likewise. (*<any_lshift>_and_internal): New define_insn_and_split to recognize a logical shift followed by an AND, and split it again after reload. (*swpn): New define_insn matching xstormy16's swpn. (*swpn_zext): New define_insn recognizing swpn followed by zero_extendqihi2, i.e. with the high byte set to zero. (*swpn_sext): Likewise, for swpn followed by cbw. (*swpn_sext_2): Likewise, for an alternate RTL form. (*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior sequence is split in the correct place to recognize the *swpn_zext followed by any_or_plus (ior, xor or plus) instruction. gcc/testsuite/ChangeLog * gcc.target/xstormy16/swpn-1.c: New QImode test case. * gcc.target/xstormy16/swpn-2.c: New zero_extend test case. * gcc.target/xstormy16/swpn-3.c: New sign_extend test case. * gcc.target/xstormy16/swpn-4.c: New HImode test case.

I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X##1; \ template <class U> struct X##2; \ template <class U> struct X##3; \ template <class U> struct X##4; \ template <class U> struct X##5; \ template <class U> struct X##6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.

ecatmur added 4 commits April 18, 2022 22:48

Merge branch 'gcc-mirror:master' into pr-104996

dbf5427

Merge branch 'gcc-mirror:master' into pr-104996

aacd65f

Merge branch 'gcc-mirror:master' into pr-104996

559a7af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PATCH] c++: Fall through for arrays of T vs T cv [PR104996] #4

[PATCH] c++: Fall through for arrays of T vs T cv [PR104996] #4

ecatmur commented Jul 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[PATCH] c++: Fall through for arrays of T vs T cv [PR104996] #4

Are you sure you want to change the base?

[PATCH] c++: Fall through for arrays of T vs T cv [PR104996] #4

Conversation

ecatmur commented Jul 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants