Skip to content

Releases/gcc 12 #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2,886 commits into
base: master
Choose a base branch
from
Open

Releases/gcc 12 #65

wants to merge 2,886 commits into from

Conversation

jacopobrusini
Copy link

Support for Apple Silicon!!!

@jwakely
Copy link
Contributor

jwakely commented Feb 21, 2024

This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time.

Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place.

NinaRanns pushed a commit to NinaRanns/gcc that referenced this pull request Jan 28, 2025
…on-r15-7214-g0710024b5bd861

Contracts nonattr rebase on r15 7214 g0710024b5bd861
GCC Administrator and others added 27 commits February 11, 2025 00:19
x is not a macro argument.  It just happens to work as final.cc passes
x for 2nd argument:

final.cc:      ASM_OUTPUT_SYMBOL_REF (file, x);

	PR target/118825
	* config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with
	SYM.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 7317fc0)
Add crtbeginT.o to extra_parts on FreeBSD. This ensures we use GCC's
crt objects for static linking. Otherwise it could mix crtbeginT.o
from the base system with libgcc's crtend.o, possibly leading to
segfaults.

libgcc:
	PR target/118685
	* config.host (*-*-freebsd*): Add crtbeginT.o to extra_parts.

Signed-off-by: Dimitry Andric <dimitry@andric.com>
During combine we may end up with

(set (reg:DI 66 [ _6 ])
     (ashift:DI (reg:DI 72 [ x ])
                (subreg:QI (and:TI (reg:TI 67 [ _1 ])
                                   (const_wide_int 0x0aaaaaaaaaaaaaabf))
                           15)))

where the shift count operand does not trivially fit the scheme of
address operands.  Reject those operands, especially since
strip_address_mutations() expects expressions of the form
(and ... (const_int ...)) and fails for (and ... (const_wide_int ...)).

Thus, be more strict here and accept only CONST_INT operands.  Done by
replacing immediate_operand() with const_int_operand() which is enough
since the former only additionally checks for LEGITIMATE_PIC_OPERAND_P
and targetm.legitimate_constant_p which are always true for CONST_INT
operands.

While on it, fix indentation of the if block.

gcc/ChangeLog:

	PR target/118835
	* config/s390/s390.cc (s390_valid_shift_count): Reject shift
	count operands which do not trivially fit the scheme of
	address operands.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/pr118835.c: New test.

(cherry picked from commit ac9806d)
Floating-point emulation in the D front-end is done via a type named
`struct longdouble`, which in GDC is a small interface around the
real_value type. Because the D code cannot include gcc/real.h directly,
a big enough buffer is used for the data instead.

On x86_64, this buffer is actually bigger than real_value itself, so
when a new longdouble object is created with

    longdouble r;
    real_from_string3 (&r.rv (), buffer, mode);
    return r;

there is uninitialized padding at the end of `r`.  This was never a
problem when D was implemented in C++ (until GCC 12) as comparing two
longdouble objects with `==' would be forwarded to the relevant
operator== overload that extracted the underlying real_value.

However when the front-end was translated to D, such conditions were
instead rewritten into identity comparisons

    return exp.toReal() is CTFloat.zero

The `is` operator gets lowered as a call to `memcmp() == 0', which is
where the read of uninitialized memory occurs, as seen by valgrind.

==26778== Conditional jump or move depends on uninitialised value(s)
==26778==    at 0x911F41: dmd.dstruct._isZeroInit(dmd.expression.Expression) (dstruct.d:635)
==26778==    by 0x9123BE: StructDeclaration::finalizeSize() (dstruct.d:373)
==26778==    by 0x86747C: dmd.aggregate.AggregateDeclaration.determineSize(ref const(dmd.location.Loc)) (aggregate.d:226)
[...]

To avoid accidentally reading uninitialized data, explicitly initialize
all `longdouble` variables with an empty constructor on C++ side of the
implementation before initializing underlying real_value type it holds.

	PR d/116961

gcc/d/ChangeLog:

	* d-codegen.cc (build_float_cst): Change new_value type from real_t to
	real_value.
	* d-ctfloat.cc (CTFloat::fabs): Default initialize the return value.
	(CTFloat::ldexp): Likewise.
	(CTFloat::parse): Likewise.
	* d-longdouble.cc (longdouble::add): Likewise.
	(longdouble::sub): Likewise.
	(longdouble::mul): Likewise.
	(longdouble::div): Likewise.
	(longdouble::mod): Likewise.
	(longdouble::neg): Likewise.
	* d-port.cc (Port::isFloat32LiteralOutOfRange): Likewise.
	(Port::isFloat64LiteralOutOfRange): Likewise.

gcc/testsuite/ChangeLog:

	* gdc.dg/pr116961.d: New test.

(cherry picked from commit f7bc17e)
jakubjelinek and others added 30 commits June 13, 2025 11:57
The following testcase is miscompiled due to a bug in
optimize_range_tests_to_bit_test.  It is trying to optimize
check for a in [-34,-34] or [-26,-26] or [-6,-6] or [-4,inf] ranges.
Another reassoc optimization folds the the test for the first
two ranges into (a + 34U) & ~8U in [0U,0U] range, and extract_bit_test_mask
actually has code to virtually undo it and treat that again as test
for a being -34 or -26.  The problem is that optimize_range_tests_to_bit_test
remembers in the type variable TREE_TYPE (ranges[i].exp); from the first
range.  If extract_bit_test_mask doesn't do that virtual undoing of the
BIT_AND_EXPR handling, that is just fine, the returned exp is ranges[i].exp.
But if the first range is BIT_AND_EXPR, the type could be different, the
BIT_AND_EXPR form has the optional cast to corresponding unsigned type
in order to avoid introducing UB.  Now, type was used to fill in the
max value if ranges[j].high was missing in subsequently tested range,
and so in this particular testcase the [-4,inf] range which was
signed int and so [-4,INT_MAX] was treated as [-4,UINT_MAX] instead.
And we were subtracting values of 2 different types and trying to make
sense out of that.

The following patch fixes this by using the type of the low bound
(which is always non-NULL) for the max value of the high bound instead.

2025-02-24  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/118915
	* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): For
	highj == NULL_TREE use TYPE_MAX_VALUE (TREE_TYPE (lowj)) rather
	than TYPE_MAX_VALUE (type).

	* gcc.c-torture/execute/pr118915.c: New test.

(cherry picked from commit 5806279)
The following testcase was emitting false positive warning that
the rhs of #pragma omp atomic write was stored but not read,
when the atomic actually does read it.  The following patch
fixes that by calling default_function_array_read_conversion
on it, so that it is marked as read as well as converted from
lvalue to rvalue.
Furthermore, the code had
if (code == NOP_EXPR) ... else ... if (code == NOP_EXPR) ...
with none of ... parts changing code, so I've merged the two ifs.

2025-02-25  Jakub Jelinek  <jakub@redhat.com>

	PR c/119000
	* c-parser.cc (c_parser_omp_atomic): For omp write call
	default_function_array_read_conversion on the rhs expression.
	Merge the two adjacent if (code == NOP_EXPR) blocks.

	* c-c++-common/gomp/pr119000.c: New test.

(cherry picked from commit cdffc76)
…fault_args etc. modify it [PR98533]

The following testcases ICE during type verification, because TYPE_FIELDS
of e.g. S RECORD_TYPE in pr119123.C is different from TYPE_FIELDS of const S.
Various decls are added to S's TYPE_FIELDS first, then finish_struct
indirectly calls fixup_type_variants to sync the variant copies.
But later on cp_parser_class_specifier calls
cp_parser_late_parsing_default_args and that apparently adds a lambda
type (from default argument) to TYPE_FIELDS of S.
Dunno if that is right or not, assuming it is right, the following
patch fixes it by updating TYPE_FIELDS of variant types if there were
any changes in the various functions cp_parser_class_specifier defers and
calls on the outermost enclosing class.
There was quite a lot of code repetition already before, so the patch
uses a lambda to avoid the repetitions.
To my surprise, in some of the contract testcases (
g++.dg/contracts/contracts-friend1.C
g++.dg/contracts/contracts-nested-class1.C
g++.dg/contracts/contracts-nested-class2.C
g++.dg/contracts/contracts-redecl7.C
g++.dg/contracts/contracts-redecl8.C
) it is actually setting class_type and pushing TRANSLATION_UNIT_DECL
rather than some class types in some cases.

Or should the lambda pushing into the containing class be somehow avoided?

2025-03-06  Jakub Jelinek  <jakub@redhat.com>

	PR c++/98533
	PR c++/119123
	* parser.cc (cp_parser_class_specifier): Update TYPE_FIELDS of
	variant types in case cp_parser_late_parsing_default_args etc. change
	TYPE_FIELDS on the main variant.  Add switch_to_class lambda and
	use it to simplify repeated class switching code.

	* g++.dg/cpp0x/pr98533.C: New test.
	* g++.dg/cpp0x/pr119123.C: New test.

(cherry picked from commit 179e010)
The following testcase takes very long time to compile, because
skip_simple_arithmetic decides to first call tree_invariant_p on
the second argument (and indirectly recurse there).  I think before
canonicalization of operands for commutative binary expressions
(and for non-commutative ones always) it is pretty common that the
first operand is a constant, something which tree_invariant_p handles
immediately, so the following patch special cases that; I've added
there a tree_invariant_p call too after the checks, while it is not
really needed currently, tree_invariant_p has the same checks, I wanted
to be prepared in case tree_invariant_p changes.  But if you think
I should avoid it, I can drop it too.

This is just a partial fix, I think one can certainly construct a testcase
which will still have horrible compile time complexity (but I've tried and
haven't managed to do so), so perhaps we should just limit the recursion
depth through skip_simple_arithmetic/tree_invariant_p with some defaulted
argument.

2025-03-11  Jakub Jelinek  <jakub@redhat.com>

	PR c/119183
	* tree.cc (skip_simple_arithmetic): If first operand of binary
	expr is TREE_CONSTANT or TREE_READONLY with no side-effects, call
	tree_invariant_p on that operand first instead of on the second.

	* gcc.dg/pr119183.c: New test.

(cherry picked from commit 20e5aa9)
Given the recent PR119406 I've tried to grep for concatenated string
literals without space at the end of one line and at the start of next line,
unless it was obviously intentional.
Furthermore, I've then looked through gcc.pot looking for 2 adjacent spaces
and looking back if that wasn't the case of "something "
" with spaces at both sides".

Here is the result from that.

I think just the c.opt change needs an explanation, the "" in the
description is simply eaten up somewhere during the option processing and
gcc -v --help before this patch was displaying
  -Wdeprecated-literal-operator Warn about deprecated space between  and suffix in a user-defined literal operator.

2025-03-22  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* gimplify.cc (warn_switch_unreachable_and_auto_init_r): Add missing
	space in the middle of diagnostics.

(cherry picked from commit 20360e4)
… spots [PR119291]

The following testcase is miscompiled on x86_64-linux at -O2 by the combiner.
We have from earlier combinations
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
        (const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal}
     (nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
        (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
     (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
        (nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (parallel [
            (set (reg:CCZ 17 flags)
                (compare:CCZ (neg:SI (reg:SI 104 [ _7 ]))
                    (const_int 0 [0])))
            (set (reg/v:SI 116 [ e ])
                (neg:SI (reg:SI 104 [ _7 ])))
        ]) "pr119291.c":26:13 977 {*negsi_2}
     (expr_list:REG_DEAD (reg:SI 104 [ _7 ])
        (nil)))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
        (ne:DI (reg:CCZ 17 flags)
            (const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1}
     (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (nil)))
and try_combine is called on i3 25 and i2 22 (second time)
and reach the hunk being patched with simplified i3
(insn 25 24 26 4 (parallel [
            (set (pc)
                (pc))
            (set (reg/v:SI 116 [ e ])
                (const_int 0 [0]))
        ]) "pr119291.c":28:13 977 {*negsi_2}
     (expr_list:REG_DEAD (reg:SI 104 [ _7 ])
        (nil)))
and
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
        (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
     (nil))
Now, the try_combine code there attempts to split two independent
sets in newpat by moving one of them to i2.
And among other tests it checks
!modified_between_p (SET_DEST (set1), i2, i3)
which is certainly needed, if there would be say
(set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a]))
in between i2 and i3, we couldn't do that, as that set would overwrite
the value set by set1 we want to move to the i2 position.
But in this case pseudo 116 isn't set in between i2 and i3, but used
(and additionally there is a REG_DEAD note for it).

This is equally bad for the move, because while the i3 insn
and later will see the pseudo value that we set, the insn in between
which uses the value will see a different value from the one that
it should see.

As we don't check for that, in the end try_combine succeeds and
changes the IL to:
(insn 22 21 23 4 (set (reg/v:SI 116 [ e ])
        (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
     (nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
        (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
     (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
        (nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (set (pc)
        (pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE}
     (nil))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
        (const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal}
     (nil))
(note, the i3 got turned into a nop and try_combine also modified insn 27).

The following patch replaces the modified_between_p
tests with reg_used_between_p, my understanding is that
modified_between_p is a subset of reg_used_between_p, so one
doesn't need both.

Looking at this some more today, I think we should special case
set_noop_p because that can be put into i2 (except for the JUMP_P
violations), currently both modified_between_p (pc_rtx, i2, i3)
and reg_used_between_p (pc_rtx, i2, i3) returns false.
I'll post a patch incrementally for that (but that feels like
new optimization, so probably not something that should be backported).

On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote:
> Can we constrain SET_DEST (set1/set0) to a REG_P in combine?  Why
> does the comment talk about memory?

I was worried about making too risky changes this late in stage4
(and especially also for backports).  Most of this code is 1992-ish.
I think many of the functions are just misnamed, the reg_ in there doesn't
match what those functions do (bet they initially supported just REGs
and later on support for other kinds of expressions was added, but haven't
done git archeology to prove that).

What we know for sure is:
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
that is checked earlier in the condition.
Then it calls
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
                                  XVECEXP (newpat, 0, 0))
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
                                  XVECEXP (newpat, 0, 1))
While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p
which is also misnamed, that function handles just fine all of
REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT,
STRICT_LOW_PART, PC and even some further cases.
So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG
of REG, PC (at least the REG and PC cases are triggered on the testcase)
and quite possibly also MEM (SUBREG of MEM not, see below).

Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that
function for constants just returns false, for PC returns true, for REG
returns reg_set_between_p, for MEM recurses on the address, for
MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code
whether the memory could have been modified in between, for all other
rtxes recurses on the subrtxes.  This part didn't change in my patch.

I've only changed those
-         && !modified_between_p (SET_DEST (set{1,0}), i2, i3)
+         && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3)
where the former has been described above and clearly handles all of
REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things.

The replacement reg_used_between_p calls reg_overlap_mentioned_p on each
instruction in between i2 and i3.  So, there is clearly a difference
in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p
returns unconditionally true even if there are no instructions in between,
but reg_used_between_p if there are no non-debug insns in between returns
false.  Sorry for missing that, guess I should check for that (with the
exception of the noop moves which are often (set (pc) (pc)) and handled
by the incremental patch).  In fact not just that, reg_used_between_p
will only return true for PC if it is mentioned anywhere in the insns
in between.
Anyway, except for that, for REG it calls refers_to_regno_p
and so should find any occurrences of any of the REG or parts of it for hard
registers, for MEM returns true if it sees any MEMs in insns in between
(conservatively), for SUBREGs apparently it relies on it being SUBREG of REG
(so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the
SUBREG_REG, PC I've already described.

Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think
already the initial
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
                                  XVECEXP (newpat, 0, 0))
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
                                  XVECEXP (newpat, 0, 1))
calls would have failed --enable-checking=rtl or would have misbehaved, so
I think there is no need to check for it further.

To your question why I don't use reg_referenced_p, that is because
reg_referenced_p is something to call on one insn pattern, while
reg_used_between_p is pretty much that on all insns in between two
instructions (excluding the boundaries).

So, I think it would be safer to add && SET_DEST (set{1,0} != pc_rtx
checks to preserve former behavior, like in the following version.

2025-04-01  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/119291
	* combine.cc (try_combine): For splitting of PARALLEL with
	2 independent SETs into i2 and i3 sets check reg_used_between_p
	of the SET_DESTs rather than just modified_between_p.

	* gcc.c-torture/execute/pr119291.c: New test.

(cherry picked from commit 19ba913)
The following testcase ICEs because c_fully_fold isn't performed on the
arguments of __sanitizer_ptr_{sub,cmp} builtins and so e.g.
C_MAYBE_CONST_EXPR can leak into the gimplifier where it ICEs.

2025-04-02  Jakub Jelinek  <jakub@redhat.com>

	PR c/119582
	* c-typeck.cc (pointer_diff, build_binary_op): Call c_fully_fold on
	__sanitizer_ptr_sub or __sanitizer_ptr_cmp arguments.

	* gcc.dg/asan/pr119582.c: New test.

(cherry picked from commit 29bc904)
I can reproduce a really weird error in our distro i686 trunk gcc
(but haven't managed to reproduce it with vanilla trunk yet).
echo 'void foo (void) {}' > a.c; gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -o a.o -c a.c; gcc -O2 -flto=auto -m32 -march=i686 -r -o a.lo a.o
lto1: fatal error: open  failed: No such file or directory
compilation terminated.
lto-wrapper: fatal error: gcc returned 1 exit status
The error is because
cat ./a.lo.lto.o-args.0
""
a.o
My suspicion is that this "" in there is caused by weird .gnu.lto_.opts
section content during
gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -S -o a.s -c a.c
compilation (and I can reproduce that one with vanilla trunk).
The above results in
        .section        .gnu.lto_.opts,"e",@progbits
        .string "'-fno-openmp' '-fno-openacc' '-fPIC' '' '-m32' '-march=i686' '-O2' '-flto=auto' '-ffat-lto-objects'"
There are two weird things, one (IMHO the cause of the "" later on) is
the '' part, I think it comes from lto_write_options doing
append_to_collect_gcc_options (&temporary_obstack, &first_p, "");
IMHO it shouldn't call append_to_collect_gcc_options at all for that case.

The -fhardened option causes global_options.x_flag_cf_protection
to be set to CF_FULL and later on the backend option processing
sets it to CF_FULL | CF_SET (i.e. 7, a value not handled in
lto_write_options).

The following patch fixes it by not emitting anything there if
flag_cf_protection is one of the unhandled values.

Perhaps it could incrementally use
switch (global_options.x_flag_cf_protection & ~CF_SET)
instead, dunno.

And the other problem is that the -fPIC in there is really weird.
Our distro compiler or vanilla configured trunk certainly doesn't
default to -fPIC and -fhardened uses -fPIE when
-fPIC/-fpic/-fno-pie/-fno-pic is not specified, so I was expecting
-fPIE in there.
The thing is that the -fpie option causes setting of both
global_options.x_flag_pi{c,e} to 1, -fPIE both to 2:
      /* If -fPIE or -fpie is used, turn on PIC.  */
      if (opts->x_flag_pie)
        opts->x_flag_pic = opts->x_flag_pie;
      else if (opts->x_flag_pic == -1)
        opts->x_flag_pic = 0;
      if (opts->x_flag_pic && !opts->x_flag_pie)
        opts->x_flag_shlib = 1;
so checking first for flag_pic == 2 and then flag_pic == 1
and only afterwards for flag_pie means we never print
-fPIE/-fpie.

Or do you want something further (like
switch (global_options.x_flag_cf_protection & ~CF_SET)
)?

2025-04-04  Jakub Jelinek  <jakub@redhat.com>

	PR lto/119625
	* lto-opts.cc (lto_write_options): If neither flag_pic nor
	flag_pie are set, check first for flag_pie and only later
	for flag_pic rather than the other way around, use a temporary
	variable.  If flag_cf_protection is not set, don't append anything
	if flag_cf_protection is none of CF_{NONE,FULL,BRANCH,RETURN} and
	use a temporary variable.

(cherry picked from commit d25728c)
Here is a cherry-pick from glibc [BZ #32411] fix.

As mentioned by the reporter in a pull request against gcc-mirror,
the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128
rather than 0x3.p+96f128.

The algorithm uses that to compute the t2 integer (tval2), by whose
delta it adjusts the x+xl pair and then in the result uses the precomputed
exp value for that entry.
Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes
being one smaller, sometimes one larger than the desired value, thus can mean
the x+xl pair after adjustment will be larger in absolute value than it
should be.

DesWursters created a test program for this
https://github.com/DesWurstes/comparefloats
and his results were
total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648
I've modified this so with
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3
so that it actually tests pseudo-random _Float128 values with range
(-16384.,16384) with strong bias on values larger than 0.0002 in absolute
value (so that tval1/tval2 aren't zero most of the time) and that gave
total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255
So, in both cases, in most cases the change doesn't result in any differences,
and in those rare cases where does, about 85% have smaller ulp than without
the patch.
Additionally I've tried
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4
and in 2 billion iterations it didn't find any case where x+xl after the
adjustments without this change would be smaller in absolute value compared
to x+xl after the adjustments with this change.

2025-04-09  Jakub Jelinek  <jakub@redhat.com>

	* math/expq.c (C): Fix up THREEp96 constant.

(cherry picked from commit e081ced)
With --enable-host-pie -freport-bug almost never prepares preprocessed
source and instead emits
The bug is not reproducible, so it is likely a hardware or OS problem.
message even for bogus which are 100% reproducible.
The way -freport-bug works is that it reruns it 3 times, capturing stdout
and stderr from each and then tries to compare the outputs in between
different runs.
The libbacktrace emitted hexadecimal addresses at the start of the lines
can differ between runs due to ASLR, either of the PIE executable, or
even if not PIE if there is some frame with e.g. libc function (say
crash in strlen/memcpy etc.).

The following patch fixes it by ignoring such differences at the start of
the lines.

2025-04-12  Jakub Jelinek  <jakub@redhat.com>

	PR driver/119727
	* gcc.cc (files_equal_p): Rewritten using fopen/fgets/fclose instead
	of open/fstat/read/close.  At the start of lines, ignore lowercase
	hexadecimal addresses followed by space.

(cherry picked from commit 8b2ceb4)
Andi had a useful comment that even with the PR119727 workaround to
ignore differences in libbacktrace printed addresses, it is still better
to turn off ASLR when easily possible, e.g. in case some address leaks
in somewhere in the ICE message elsewhere, or to verify the ICE doesn't
depend on a particular library/binary load addresses.

The following patch adds a configure check and uses personality syscall
to turn off randomization for further -freport-bug subprocesses.

2025-04-14  Jakub Jelinek  <jakub@redhat.com>

	PR driver/119727
	* configure.ac (HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE): New check.
	* gcc.cc: Include sys/personality.h if
	HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
	(try_generate_repro): Call
	personality (personality (0xffffffffU) | ADDR_NO_RANDOMIZE)
	if HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
	* config.in: Regenerate.
	* configure: Regenerate.

(cherry picked from commit 5a32e85)
This is a regression on some targets introduced I believe by r6-2055
which added mode argument to set_src_cost.

The problem here is that in the first iteration, mode is always QImode
and we get as -Os zero cost set_src_cost (const0_rtx, QImode, false).
But then we use the mode variable for iterating over int, partial int
and vector int modes, so for the second iteration we call set_src_cost
with mode which is at that time (machine_mode) (MAX_MODE_VECTOR_INT + 1).

In the x86 case that happens to be V2HFmode and we don't crash (and
compute the same 0 cost as we would for QImode).
But e.g. in the SPARC case (machine_mode) (MAX_MODE_VECTOR_INT + 1) is
MAX_MACHINE_MODE and that does all kinds of weird things especially
when doing ubsan bootstrap.

Fixed by always using QImode.

2025-04-14  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/119785
	* expmed.cc (init_expmed): Always pass QImode rather than mode to
	set_src_cost passed to set_zero_cost.

(cherry picked from commit f96a543)
As mentioned in the PR (and I think in PR101075 too), we can run into
deadlock with libat_lock_n calls with larger n.
As mentioned in PR66842, we use multiple locks (normally 64 mutexes
for each 64 byte cache line in 4KiB page) and currently can lock more
than one lock, in particular for n [0, 64] a single lock, for n [65, 128]
2 locks, for n [129, 192] 3 locks etc.
There are two problems with this:
1) we can deadlock if there is some wrap-around, because the locks are
   acquired always in the order from addr_hash (ptr) up to
   locks[NLOCKS-1].mutex and then if needed from locks[0].mutex onwards;
   so if e.g. 2 threads perform libat_lock_n with n = 2048+64, in one
   case at pointer starting at page boundary and in another case at
   page boundary + 2048 bytes, the first thread can lock the first
   32 mutexes, the second thread can lock the last 32 mutexes and
   then first thread wait for the lock 32 held by second thread and
   second thread wait for the lock 0 held by the first thread;
   fixed below by always locking the locks in order of increasing
   index, if there is a wrap-around, by locking in 2 loops, first
   locking some locks at the start of the array and second at the
   end of it
2) the number of locks seems to be determined solely depending on the
   n value, I think that is wrong, we don't know the structure alignment
   on the libatomic side, it could very well be 1 byte aligned struct,
   and so how many cachelines are actually (partly or fully) occupied
   by the atomic access depends not just on the size, but also on
   ptr % WATCH_SIZE, e.g. 2 byte structure at address page_boundary+63
   should IMHO lock 2 locks because it occupies the first and second
   cacheline

Note, before this patch it locked exactly one lock for n = 0, while
with this patch it could lock either no locks at all (if it is at cacheline
boundary) or 1 (otherwise).
Dunno of libatomic APIs can be called for zero sizes and whether
we actually care that much how many mutexes are locked in that case,
because one can't actually read/write anything into zero sized memory.
If you think it is important, I could add else if (nlocks == 0) nlocks = 1;
in both spots.

2025-04-16  Jakub Jelinek  <jakub@redhat.com>

	PR libgcc/101075
	PR libgcc/119796
	* config/posix/lock.c (libat_lock_n, libat_unlock_n): Start with
	computing how many locks will be needed and take into account
	((uintptr_t)ptr % WATCH_SIZE).  If some locks from the end of the
	locks array and others from the start of it will be needed, first
	lock the ones from the start followed by ones from the end.

(cherry picked from commit 61dfb07)
Here is just a port of the previously posted patch to mingw which
clearly has the same problems.

2025-04-16  Jakub Jelinek  <jakub@redhat.com>

	PR libgcc/101075
	PR libgcc/119796
	* config/mingw/lock.c (libat_lock_n, libat_unlock_n): Start with
	computing how many locks will be needed and take into account
	((uintptr_t)ptr % WATCH_SIZE).  If some locks from the end of the
	locks array and others from the start of it will be needed, first
	lock the ones from the start followed by ones from the end.

(cherry picked from commit 34fe8e9)
…decisions [PR119327]

The following testcase FAILs because the always_inline function can't
be inlined.
The rs6000 backend has similarly to other targets a hook which rejects
inlining which would bring in new ISAs which aren't there in the caller.
And this hook rejects this because of OPTION_MASK_SAVE_TOC_INDIRECT
differences.
This flag is set if explicitly requested or by default depending on
whether the current function looks hot (or at least not cold):
  if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
      && flag_shrink_wrap_separate
      && optimize_function_for_speed_p (cfun))
    rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
The target nodes that are being compared here are actually the default
target node (which was created when cfun was NULL) vs. one that was
created for the always_inline function when it wasn't NULL, so one
doesn't have it, the other does.
In any case, this flag feels like a tuning decision rather than hard
ISA requirement and I see no problems why we couldn't inline
even explicit -msave-toc-indirect function into -mno-save-toc-indirect
or vice versa.
We already ignore OPTION_MASK_P{8,10}_FUSION which are also more
like tuning flags.

2025-04-22  Jakub Jelinek  <jakub@redhat.com>

	PR target/119327
	* config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore also
	OPTION_MASK_SAVE_TOC_INDIRECT differences.

	* g++.dg/opt/pr119327.C: New test.

(cherry picked from commit 4b62cf5)
We need to drop the kind argument from what is passed to the
library, but need to do it not only when one uses the argument name
for it (so kind=4 etc.) but also when one passes all the arguments
to the intrinsics.

The following patch uses what gfc_conv_intrinsic_findloc uses,
which looks more efficient and cleaner, we already set automatic
vars to point to the kind and back actual arguments, so we can just
free/clear expr on the former and set name to "%VAL" on the latter.

And similarly clears dim argument for the BT_CHARACTER case when using
maxloc2/minloc2, again regardless of whether it was named or not.

2025-05-13  Jakub Jelinek  <jakub@redhat.com>
	    Daniil Kochergin  <daniil2472s@gmail.com>
	    Tobias Burnus  <tburnus@baylibre.com>

	PR fortran/120191
	* trans-intrinsic.cc (strip_kind_from_actual): Remove.
	(gfc_conv_intrinsic_minmaxloc): Don't call strip_kind_from_actual.
	Free and clear kind_arg->expr if non-NULL.  Set back_arg->name to
	"%VAL" instead of a loop looking for last argument.  Remove actual
	variable, use array_arg instead.  Free and clear dim_arg->expr if
	non-NULL for BT_CHARACTER cases instead of using a loop.

	* gfortran.dg/pr120191_1.f90: New test.

(cherry picked from commit ec249be)
I've tried to write a testcase for the BT_CHARACTER maxloc/minloc with named
or unnamed arguments and indeed the just posted patch fixed the arguments
in there in multiple cases to match what the library expects.
But the testcase still fails, due to library problems.

One dealt with in this patch are _gfortran_s{max,min}loc2_{4,8,16}_s{1,4}
functions.  Those are trivial wrappers around
_gfortrani_{max,min}loc2_{4,8,16}_s{1,4} which should call those functions
if the scalar mask is true and just return 0 otherwise.
The two bugs I see there is that the back, len arguments are swapped,
which means that it always acts as back=.true. and for len will use
character length of 1 or 0 instead of the desired one.
The _gfortrani_{max,min}loc2_{4,8,16}_s{1,4} functions have prototypes like
GFC_INTEGER_4
maxloc2_4_s1 (gfc_array_s1 * const restrict array, GFC_LOGICAL_4 back, gfc_charlen_type len)
so back comes before len, ditto for the
GFC_INTEGER_4
smaxloc2_4_s1 (gfc_array_s1 * const restrict array,
               GFC_LOGICAL_4 *mask, GFC_LOGICAL_4 back, gfc_charlen_type len)
The other problem is that it was just testing if (mask).  In my limited
Fortran understanding that means that the optional argument mask was
supplied but nothing about its actual value.  Other scalar mask generated
routines use if (mask == NULL || *mask) as the condition when to call the
non-masked function, i.e. when mask is not supplied (then it should act like
.true. mask) or when it is supplied and evaluates to .true.).

2025-05-13  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/120191
	* m4/maxloc2s.m4: For smaxloc2 call maxloc2 if mask is NULL or *mask.
	Swap back and len arguments.
	* m4/minloc2s.m4: Likewise.
	* generated/maxloc2_4_s1.c: Regenerate.
	* generated/maxloc2_4_s4.c: Regenerate.
	* generated/maxloc2_8_s1.c: Regenerate.
	* generated/maxloc2_8_s4.c: Regenerate.
	* generated/maxloc2_16_s1.c: Regenerate.
	* generated/maxloc2_16_s4.c: Regenerate.
	* generated/minloc2_4_s1.c: Regenerate.
	* generated/minloc2_4_s4.c: Regenerate.
	* generated/minloc2_8_s1.c: Regenerate.
	* generated/minloc2_8_s4.c: Regenerate.
	* generated/minloc2_16_s1.c: Regenerate.
	* generated/minloc2_16_s4.c: Regenerate.

	* gfortran.dg/pr120191_2.f90: New test.

(cherry picked from commit 482f219)
There is a bug in _gfortran_s{max,min}loc1_{4,8,16}_s{1,4} which the
following testcase shows.
The functions return but then crash in the caller.
Seems that is because buffer overflows, I believe those functions for
if (mask == NULL || *mask) condition being false are supposed to fill in
the result array with all zeros (or allocate it and fill it with zeros).
My understanding is the result array in that case is integer(kind={4,8,16})
and should have the extents the character input array has.
The problem is that it uses * string_len in the extent multiplication:
      extent[n] = GFC_DESCRIPTOR_EXTENT(array,n) * string_len;
and
      extent[n] =
        GFC_DESCRIPTOR_EXTENT(array,n + 1) * string_len;
which is I guess fine and desirable for the extents of the character array,
but not for the extents of the destination array.  Yet the code uses
that extent array for that purpose (and no other purposes).
Here it uses it to set the dimensions for the case where it needs to
allocate (as well as size):
      for (n = 0; n < rank; n++)
        {
          if (n == 0)
            str = 1;
          else
            str = GFC_DESCRIPTOR_STRIDE(retarray,n-1) * extent[n-1];
          GFC_DIMENSION_SET(retarray->dim[n], 0, extent[n] - 1, str);
        }
Here it uses it for bounds checking of the destination:
      if (unlikely (compile_options.bounds_check))
        {
          for (n=0; n < rank; n++)
            {
              index_type ret_extent;

              ret_extent = GFC_DESCRIPTOR_EXTENT(retarray,n);
              if (extent[n] != ret_extent)
                runtime_error ("Incorrect extent in return value of"
                               " MAXLOC intrinsic in dimension %ld:"
                               " is %ld, should be %ld", (long int) n + 1,
                               (long int) ret_extent, (long int) extent[n]);
            }
        }
and here to find out how many retarray elements to actually fill in each
dimension:
  while(1)
    {
      *dest = 0;
      count[0]++;
      dest += dstride[0];
      n = 0;
      while (count[n] == extent[n])
        {
          /* When we get to the end of a dimension, reset it and increment
             the next dimension.  */
          count[n] = 0;
          /* We could precalculate these products, but this is a less
             frequently used path so probably not worth it.  */
          dest -= dstride[n] * extent[n];
Seems maxloc1s.m4 and minloc1s.m4 are the only users of ifunction-s.m4,
so we can change SCALAR_ARRAY_FUNCTION in there without breaking anything
else.

2025-05-13  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/120191
	* m4/ifunction-s.m4 (SCALAR_ARRAY_FUNCTION): Don't multiply
	GFC_DESCRIPTOR_EXTENT(array,) by string_len.
	* generated/maxloc1_4_s1.c: Regenerate.
	* generated/maxloc1_4_s4.c: Regenerate.
	* generated/maxloc1_8_s1.c: Regenerate.
	* generated/maxloc1_8_s4.c: Regenerate.
	* generated/maxloc1_16_s1.c: Regenerate.
	* generated/maxloc1_16_s4.c: Regenerate.
	* generated/minloc1_4_s1.c: Regenerate.
	* generated/minloc1_4_s4.c: Regenerate.
	* generated/minloc1_8_s1.c: Regenerate.
	* generated/minloc1_8_s4.c: Regenerate.
	* generated/minloc1_16_s1.c: Regenerate.
	* generated/minloc1_16_s4.c: Regenerate.

	* gfortran.dg/pr120191_3.f90: New test.

(cherry picked from commit 781cfc4)
As mentioned in the PR, _gfortran_{,m,s}findloc2_s{1,4} iterate too many
times in the back case if nothing is found.
For !back, the loops are for (i = 1; i <= extent; i++) so i is in the
body [1, extent] if nothing is found, but for back it is
for (i = extent; i >= 0; i--) so i is in the body [0, extent] and compares
one element before the start of the array.
Note, findloc1_s{1,4} uses
          for (n = len; n > 0; n--, src -= delta * len_array)
for the back loop and
          for (n = 1; n <= len; n++, src += delta * len_array)
for !back.  This patch fixes that.
The testcase fails under valgrind without the libgfortran changes and
succeeds with those.

2025-05-13  Jakub Jelinek  <jakub@redhat.com>

	PR libfortran/120196
	* m4/ifindloc2.m4 (header1, header2): For back use i > 0 rather than
	i >= 0 as for condition.
	* generated/findloc2_s1.c: Regenerate.
	* generated/findloc2_s4.c: Regenerate.

	* gfortran.dg/pr120196.f90: New test.

(cherry picked from commit 748a7bc)
The UB on the following testcase isn't diagnosed by -fsanitize=address,
because we see that the array has a single element and optimize the
strlen to 0.  I think it is fine to assume e.g. for range purposes the
lower bound for the strlen as long as we don't try to optimize
strlen (str)
where we know that it returns [26, 42] to
26 + strlen (str + 26), but for the upper bound we really want to punt
on optimizing that for -fsanitize=address to read all the bytes of the
string and diagnose if we run to object end etc.

2024-02-06  Jakub Jelinek  <jakub@redhat.com>

	PR sanitizer/110676
	* gimple-fold.cc (gimple_fold_builtin_strlen): For -fsanitize=address
	reset maxlen to sizetype maximum.

	* gcc.dg/asan/pr110676.c: New test.

(cherry picked from commit d3eac7d)
mark_vtable_entries already has

   /* It's OK for the vtable to refer to deprecated virtual functions.  */
   warning_sentinel w(warn_deprecated_decl);

but that doesn't cover __attribute__((unavailable)).  We can use the
following override to cover both.

	PR c++/116606

gcc/cp/ChangeLog:

	* decl2.cc (mark_vtable_entries): Temporarily override deprecated_state to
	UNAVAILABLE_DEPRECATED_SUPPRESS.  Remove a warning_sentinel.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/attr-unavailable-13.C: New test.

(cherry picked from commit d9d34f9)
…for methods with such attributes [PR116636]

On the following testcase, we emit false positive warnings/errors about using
the deprecated or unavailable methods when creating thunks for them, even
when nothing (in the testcase so far) actually used those.

The following patch temporarily disables that diagnostics when creating
the thunks.

2024-09-12  Jakub Jelinek  <jakub@redhat.com>

	PR c++/116636
	* method.cc: Include decl.h.
	(use_thunk): Temporarily change deprecated_state to
	UNAVAILABLE_DEPRECATED_SUPPRESS.

	* g++.dg/warn/deprecated-19.C: New test.

(cherry picked from commit 4026d89)
The following testcases are miscompiled on s390x-linux, because the
doloop_optimize
  /* Ensure that the new sequence doesn't clobber a register that
     is live at the end of the block.  */
  {
    bitmap modified = BITMAP_ALLOC (NULL);

    for (rtx_insn *i = doloop_seq; i != NULL; i = NEXT_INSN (i))
      note_stores (i, record_reg_sets, modified);

    basic_block loop_end = desc->out_edge->src;
    bool fail = bitmap_intersect_p (df_get_live_out (loop_end), modified);
check doesn't work as intended.
The problem is that it uses df, but the df analysis was only done using
  iv_analysis_loop_init (loop);
->
  df_analyze_loop (loop);
which computes df inside on the bbs of the loop.
While loop_end bb is inside of the loop, df_get_live_out computed that
way includes registers set in the loop and used at the start of the next
iteration, but doesn't include registers set in the loop (or before the
loop) and used after the loop.

The following patch fixes that by doing whole function df_analyze first,
changes the loop iteration mode from 0 to LI_ONLY_INNERMOST (on many
targets which use can_use_doloop_if_innermost target hook a so are known
to only handle innermost loops) or LI_FROM_INNERMOST (I think only bfin
actually allows non-innermost loops) and checking not just
df_get_live_out (loop_end) (that is needed for something used by the
next iteration), but also df_get_live_in (desc->out_edge->dest),
i.e. what will be used after the loop.  df of such a bb shouldn't
be affected by the df_analyze_loop and so should be from df_analyze
of the whole function.

2024-12-05  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/113994
	PR rtl-optimization/116799
	* loop-doloop.cc: Include targhooks.h.
	(doloop_optimize): Also punt on intersection of modified
	with df_get_live_in (desc->out_edge->dest).
	(doloop_optimize_loops): Call df_analyze.  Use
	LI_ONLY_INNERMOST or LI_FROM_INNERMOST instead of 0 as
	second loops_list argument.

	* gcc.c-torture/execute/pr116799.c: New test.
	* g++.dg/torture/pr113994.C: New test.

(cherry picked from commit 0eed816)
The following testcase is miscompiled because of RTL represententation
of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it
actually does.
Let's look e.g. at
(define_insn_and_split "*jcc_bt<mode>"
  [(set (pc)
        (if_then_else (match_operator 0 "bt_comparison_operator"
                        [(zero_extract:SWI48
                           (match_operand:SWI48 1 "nonimmediate_operand")
                           (const_int 1)
                           (match_operand:QI 2 "nonmemory_operand"))
                         (const_int 0)])
                      (label_ref (match_operand 3))
                      (pc)))
   (clobber (reg:CC FLAGS_REG))]
  "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
   && (CONST_INT_P (operands[2])
       ? (INTVAL (operands[2]) < GET_MODE_BITSIZE (<MODE>mode)
          && INTVAL (operands[2])
               >= (optimize_function_for_size_p (cfun) ? 8 : 32))
       : !memory_operand (operands[1], <MODE>mode))
   && ix86_pre_reload_split ()"
  "#"
  "&& 1"
  [(set (reg:CCC FLAGS_REG)
        (compare:CCC
          (zero_extract:SWI48
            (match_dup 1)
            (const_int 1)
            (match_dup 2))
          (const_int 0)))
   (set (pc)
        (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
                      (label_ref (match_dup 3))
                      (pc)))]
{
  operands[0] = shallow_copy_rtx (operands[0]);
  PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
})
The define_insn part in RTL describes exactly what it does,
jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ).
The problem is with what it splits into.
put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU,
nc for NE and GEU and ICEs otherwise.
CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb,
in those cases e.g. for add we have
(set (reg:CCC flags) (compare:CCC (plus:M x y) x))
and use (ltu (reg:CCC flags) (const_int 0)) for carry set and
(geu (reg:CCC flags) (const_int 0)) for carry not set.  These cases
model in RTL what is actually happening, compare in infinite precision
x from the result of finite precision addition in M mode and if it is
less than unsigned (i.e. overflow happened), carry is set.
Another use of CCCmode is in UNSPEC_* patterns, those are used with
(eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset,
given the UNSPEC no big deal, the middle-end doesn't know what means
set or unset.
But for the bt{l,q}; j{c,nc} case the above splits it into
(set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0)))
for bt and
(set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
for the bit set case (so that the jump expands to jc) and ne for
the bit not set case (so that the jump expands to jnc).
Similarly for the different splitters for cmov and set{c,nc} etc.
The problem is that when the middle-end reads this RTL, it feels
the exact opposite to it.  If zero_extract is 1, flags is set
to comparison of 1 and 0 and that would mean using ne ne in the
if_then_else, and vice versa.

So, in order to better describe in RTL what is actually happening,
one possibility would be to swap the behavior of put_condition_code
and use NE + LTU -> c and EQ + GEU -> nc rather than the current
EQ + LTU -> c and NE + GEU -> nc; and adjust everything.  The
following patch uses a more limited approach, instead of representing
bt{l,q}; j{c,nc} case as written above it uses
(set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract)))
and
(set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
which uses the existing put_condition_code but describes what the
insns actually do in RTL clearly.  If zero_extract is 1,
then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU,
0U >= 0U.  The patch adjusts the *bt<mode> define_insn and all the
splitters to it and its comparisons/conditional moves/setXX.

2025-02-10  Jakub Jelinek  <jakub@redhat.com>

	PR target/118623
	* config/i386/i386.md (*bt<mode>): Represent bt as
	compare:CCC of const0_rtx and zero_extract rather than
	zero_extract and const0_rtx.
	(*jcc_bt<mode>): Likewise.  Use LTU and GEU as flags test
	instead of EQ and NE.
	(*jcc_bt<mode>_1): Likewise.
	(*jcc_bt<mode>_mask): Likewise.
	(Help combine recognize bt followed by cmov splitter): Likewise.
	(*bt<mode>_setcqi): Likewise.
	(*bt<mode>_setncqi): Likewise.
	(*bt<mode>_setnc<mode>): Likewise.

	* gcc.c-torture/execute/pr118623.c: New test.

(cherry picked from commit 9214201)
…lier implicit instantation [PR113976]

Already previously instantiated const variable templates had
cp_apply_type_quals_to_decl called when they were instantiated,
but if they need runtime initialization, their TREE_READONLY flag
has been subsequently cleared.
Explicit variable template instantiation calls grokdeclarator which
calls cp_apply_type_quals_to_decl on them again, setting TREE_READONLY
flag again, but nothing clears it afterwards, so we emit such
instantiations into rodata sections and segfault when the dynamic
initialization attempts to initialize them.

The following patch fixes that by not calling cp_apply_type_quals_to_decl
on already instantiated variable declarations.

2024-02-28  Jakub Jelinek  <jakub@redhat.com>
	    Patrick Palka  <ppalka@redhat.com>

	PR c++/113976
	* decl.cc (grokdeclarator): Don't call cp_apply_type_quals_to_decl
	on DECL_TEMPLATE_INSTANTIATED VAR_DECLs.

	* g++.dg/cpp1y/var-templ87.C: New test.

(cherry picked from commit 29ac924)
The following testcase ICEs since r15-1579 (addition of late combiner),
because *clrmem_short can't be split.
The problem is that the define_insn uses
   (use (match_operand 1 "nonmemory_operand" "n,a,a,a"))
   (use (match_operand 2 "immediate_operand" "X,R,X,X"))
   (clobber (match_scratch:P 3 "=X,X,X,&a"))
and define_split assumed that if operands[1] is const_int_operand,
match_scratch will be always scratch, and it will be reg only if
it was the last alternative where operands[1] is a reg.
The pattern doesn't guarantee it though, of course RA will not try to
uselessly assign a reg there if it is not needed, but during RA
on the testcase below we match the last alternative, but then comes
late combiner and propagates const_int 3 into operands[1].  And that
matches fine, match_scratch matches either scratch or reg and the constraint
in that case is X for the first variant, so still just fine.  But we won't
split that because the splitters only expect scratch.

The following patch fixes it by using match_scratch instead of scratch,
so that it accepts either.

2025-04-17  Jakub Jelinek  <jakub@redhat.com>

	PR target/119834
	* config/s390/s390.md (define_split after *cpymem_short): Use
	(clobber (match_scratch N)) instead of (clobber (scratch)).  Use
	(match_dup 4) and operands[4] instead of (match_dup 3) and operands[3]
	in the last of those.
	(define_split after *clrmem_short): Use (clobber (match_scratch N))
	instead of (clobber (scratch)).
	(define_split after *cmpmem_short): Likewise.

	* g++.target/s390/pr119834.C: New test.

(cherry picked from commit 22fe83d)
This got broken with r13-9727 and fixed with either of
r13-9729 or r13-9728.

2025-05-30  Jakub Jelinek  <jakub@redhat.com>

	PR target/120480
	* gcc.dg/pr120480.c: New test.

(cherry picked from commit c13d5b9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.