forked from gcc-mirror/gcc
-
Notifications
You must be signed in to change notification settings - Fork 0
[PATCH] c++: Fall through for arrays of T vs T cv [PR104996] #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
ecatmur
wants to merge
4
commits into
master
Choose a base branch
from
pr-104996
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+0
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If two arrays do not have the exact same element type including qualification, this could be e.g. f(int (&&)[]) vs. f(int const (&)[]), which can still be distinguished by the lvalue-rvalue tiebreaker. By tightening this branch (in accordance with the letter of the Standard) we fall through to the next branch, which tests whether they have different element type ignoring qualification and returns 0 in that case; thus we only actually fall through in the T[...] vs. T cv[...] case, eventually considering the lvalue-rvalue tiebreaker at the end of compare_ics. Signed-off-by: Ed Catmur <ed@catmur.uk> PR c++/104996 gcc/cp/ChangeLog: * call.cc (compare_ics): When comparing list-initialization sequences, do not return early. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist129.C: New test.
ecatmur
pushed a commit
that referenced
this pull request
Jan 10, 2023
With many thanks to H.J. for doing all the hard work, this patch resolves two P1 regressions; PR target/106933 and PR target/106959. Although superficially similar, the i386 backend's two scalar-to-vector (STV) passes perform their transformations in importantly different ways. The original pass converting SImode and DImode operations to V4SImode or V2DImode operations is "soft", allowing values to be maintained in both integer and vector hard registers. The newer pass converting TImode operations to V1TImode is "hard" (all or nothing) that converts all uses of a pseudo to vector form. To implement this it invokes powerful ju-ju calling SET_MODE on a reg_rtx, which due to RTL sharing, often updates this pseudo's mode everywhere in the RTL chain. Hence, TImode STV can only be performed when all uses of a pseudo are convertible to V1TImode form. To ensure this the STV passes currently use data-flow analysis to inspect all DEFs and USEs in a chain. This works fine for chains that are in the usual single assignment form, but the occurrence of uninitialized variables, or multiple assignments that split a pseudo's usage into several independent chains (lifetimes) can lead to situations where some but not all of a pseudo's occurrences need to be updated. This is safe for the SImode/DImode pass, but leads to the above bugs during the TImode pass. My one minor tweak to HJ's patch from comment #4 of bugzilla PR106959 is to only perform the new single_def_chain_p check for TImode STV; it turns out that STV of SImode/DImode min/max operates safely on multiple-def chains, and prohibiting this leads to testsuite regressions. We don't (yet) support V1TImode min/max, so this idiom isn't an issue during the TImode STV pass. For the record, the two alternate possible fixes are (i) make the TImode STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx with a new pseudo, or (ii) merging "chains" so that multiple DFA chains/lifetimes are considered a single STV chain. 2022-12-23 H.J. Lu <hjl.tools@gmail.com> Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/106933 PR target/106959 * config/i386/i386-features.cc (single_def_chain_p): New predicate function to check that a pseudo's use-def chain is in SSA form. (timode_scalar_to_vector_candidate_p): Check that TImode regs that are SET_DEST or SET_SRC of an insn match/are single_def_chain_p. gcc/testsuite/ChangeLog PR target/106933 PR target/106959 * gcc.target/i386/pr106933-1.c: New test case. * gcc.target/i386/pr106933-2.c: Likewise. * gcc.target/i386/pr106959-1.c: Likewise. * gcc.target/i386/pr106959-2.c: Likewise. * gcc.target/i386/pr106959-3.c: Likewise.
ecatmur
pushed a commit
that referenced
this pull request
Feb 4, 2023
The aarch64 ISA specification allows a left shift amount to be applied
after extension in the range of 0 to 4 (encoded in the imm3 field).
This is true for at least the following instructions:
* ADD (extend register)
* ADDS (extended register)
* SUB (extended register)
The result of this patch can be seen, when compiling the following code:
uint64_t myadd(uint64_t a, uint64_t b)
{
return a+(((uint8_t)b)<<4);
}
Without the patch the following sequence will be generated:
0000000000000000 <myadd>:
0: d37c1c21 ubfiz x1, x1, #4, gcc-mirror#8
4: 8b000020 add x0, x1, x0
8: d65f03c0 ret
With the patch the ubfiz will be merged into the add instruction:
0000000000000000 <myadd>:
0: 8b211000 add x0, x0, w1, uxtb #4
4: d65f03c0 ret
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_uxt_size): fix an
off-by-one in checking the permissible shift-amount.
ecatmur
pushed a commit
that referenced
this pull request
May 17, 2023
This patch adds support for xstormy16's swap nibbles instruction (swpn).
For the test case:
short foo(short x) {
return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f);
}
GCC with -O2 currently generates the nine instruction sequence:
foo: mov r7,r2
asr r2,#4
and r2,gcc-mirror#15
mov.w r6,#-256
and r6,r7
or r2,r6
shl r7,#4
and r7,#255
or r2,r7
ret
with this patch, we now generate:
foo: swpn r2
ret
To achieve this using combine's four instruction "combinations" requires
a little wizardry. Firstly, define_insn_and_split are introduced to
treat logical shifts followed by bitwise-AND as macro instructions that
are split after reload. This is sufficient to recognize a QImode
nibble swap, which can be implemented by swpn followed by either a
zero-extension or a sign-extension from QImode to HImode. Then finally,
in the correct context, a QImode swap-nibbles pattern can be combined to
preserve the high-byte of a HImode word, matching the xstormy16's swpn
semantics. The naming of the new code iterators is taken from i386.md.
2023-04-29 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/stormy16/stormy16.md (any_lshift): New code iterator.
(any_or_plus): Likewise.
(any_rotate): Likewise.
(*<any_lshift>_and_internal): New define_insn_and_split to
recognize a logical shift followed by an AND, and split it
again after reload.
(*swpn): New define_insn matching xstormy16's swpn.
(*swpn_zext): New define_insn recognizing swpn followed by
zero_extendqihi2, i.e. with the high byte set to zero.
(*swpn_sext): Likewise, for swpn followed by cbw.
(*swpn_sext_2): Likewise, for an alternate RTL form.
(*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
sequence is split in the correct place to recognize the *swpn_zext
followed by any_or_plus (ior, xor or plus) instruction.
gcc/testsuite/ChangeLog
* gcc.target/xstormy16/swpn-1.c: New QImode test case.
* gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
* gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
* gcc.target/xstormy16/swpn-4.c: New HImode test case.
ecatmur
pushed a commit
that referenced
this pull request
May 17, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type. Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop. It's unlikely to make a difference on any real code, but the
simplification is also nice.
We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.
#define NC(X) \
template <class U> struct X##1; \
template <class U> struct X##2; \
template <class U> struct X##3; \
template <class U> struct X##4; \
template <class U> struct X##5; \
template <class U> struct X##6;
#define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
#define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
template <int I> struct A
{
NC3(am)
};
template <class...Ts> void sink(Ts...);
template <int...Is> void g()
{
sink(A<Is>()...);
}
template <int I> void f()
{
g<__integer_pack(I)...>();
}
int main()
{
f<1000>();
}
gcc/cp/ChangeLog:
* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
of a class template.
(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If two arrays do not have the exact same element type including qualification, this could be e.g. f(int (&&)[]) vs. f(int const (&)[]), which can still be distinguished by the lvalue-rvalue tiebreaker.
By tightening this branch (in accordance with the letter of the Standard) we fall through to the next branch, which tests whether they have different element type ignoring qualification and returns 0 in that case; thus we only actually fall through in the T[...] vs. T cv[...] case, eventually considering the lvalue-rvalue tiebreaker at the end of compare_ics.
Signed-off-by: Ed Catmur ed@catmur.uk
gcc/cp/ChangeLog:
gcc/testsuite/ChangeLog: