Skip to content

Conversation

@ecatmur
Copy link
Owner

@ecatmur ecatmur commented Jul 17, 2022

No description provided.

Rather than allocate heap space (next to the exception object?), we
use a thread_local with_stacktrace object and move-construct the catch
object immediately on entry into the catch block.

Builds on recent std::stacktrace support, added in https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588617.html
Pass --enable-libstdcxx-backtrace=yes to configure and link libstdc++_libbacktrace.a

We pass the caught type through the exception machinery in a special
WITH_STACKTRACE_TYPE type wrapper.  This is mangled as if it was an
attribute (we can't use an actual attribute-modified type, because (a)
attributes can't variably modify tagged types, and (b) only platform
attributes are allowed to affect type identity).

The function to perform the work of collecting the stacktrace at throw
point is not compiled into libsupc++ but instead grabbed out of
namespace std when doing rtti for the catch block.  This is more
flexible and in keeping with zero-overhead principle.
ecatmur pushed a commit that referenced this pull request Jul 17, 2022
This simple patch implements Richard Biener's suggestion in comment #6
of PR tree-optimization/52171 (from February 2013) that the insn-preds
code generated by genpreds can avoid using strncmp when matching constant
strings of length one.

The effect of this patch is best explained by the diff of insn-preds.cc:
<       if (!strncmp (str + 1, "g", 1))
---
>       if (str[1] == 'g')
3104c3104
<       if (!strncmp (str + 1, "m", 1))
---
>       if (str[1] == 'm')
3106c3106
<       if (!strncmp (str + 1, "c", 1))
---
>       if (str[1] == 'c')
...

The equivalent optimization is performed by GCC (but perhaps not by the
host compiler), but generating simpler/smaller code may encourage further
optimizations (such as use of a switch statement).

2022-05-24  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* genpreds.cc (write_lookup_constraint_1): Avoid generating a call
	to strncmp for strings of length one.
ecatmur pushed a commit that referenced this pull request Jul 17, 2022
This patch fixes both ICE regressions PR middle-end/105853 and
PR target/105856 caused by my recent patch to expand small const structs
as immediate constants.  That patch updated code generation in three
places: two in expr.cc that call store_constructor directly, and the
third in calls.cc's load_register_parameters that expands its CONSTRUCTOR
via expand_expr, as store_constructor is local/static to expr.cc, and
the "public" API, should usually simply forward the constructor to the
appropriate store_constructor function.

Alas, despite the clean regression testing on multiple targets, the above
ICEs show that expand_expr isn't a suitable proxy for store_constructor,
and things that (I'd assumed) shouldn't affect how/whether a struct is
placed in a register [such as whether the struct is considered packed/
aligned or not] actually interfere with the optimization that is being
attempted.

The (proposed) solution is to export store_constructor (and it's helper
function int_expr_size) from expr.cc, by removing their static qualifier
and prototyping both functions in expr.h, so they can be called directly
from load_register_parameters in calls.cc.  This cures both ICEs, but
almost as importantly improves code generation over GCC 12.

For PR 105853, GCC 12 generates:

compose_nd_na_ipv6_src:
	movzx eax, WORD PTR eth_addr_zero[rip+2]
	movzx edx, WORD PTR eth_addr_zero[rip]
	movzx edi, WORD PTR eth_addr_zero[rip+4]
	sal rax, 16
	or rax, rdx
	sal rdi, 32
	or rdi, rax
	xor eax, eax
	jmp packet_set_nd
eth_addr_zero:	.zero 6

where now (with this fix) GCC 13 generates:
compose_nd_na_ipv6_src:
        xorl    %edi, %edi
        xorl    %eax, %eax
        jmp     packet_set_nd

Likewise, for PR 105856 on ARM, we'd previously generate:
g_329_3:
	movw r3, #:lower16:.LANCHOR0
	movt r3, #:upper16:.LANCHOR0
	ldr r0, [r3]
	b func_19

but with this optimization we now generate:
g_329_3:
        mov     r0, #6
        b       func_19

2022-06-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR middle-end/105853
	PR target/105856
	* calls.cc (load_register_parameters): Call store_constructor
	and int_expr_size directly instead of expanding via expand_expr.
	* expr.cc (static void store_constructor): Don't prototype here.
	(static HOST_WIDE_INT int_expr_size): Likewise.
	(store_constructor): No longer static.
	(int_expr_size): Likewise, no longer static.
	* expr.h (store_constructor): Prototype here.
	(int_expr_size): Prototype here.

gcc/testsuite/ChangeLog
	PR middle-end/105853
	PR target/105856
	* gcc.dg/pr105853.c: New test case.
	* gcc.dg/pr105856.c: New test case.
ecatmur pushed a commit that referenced this pull request Jul 17, 2022
This patch addresses the issue in comment #6 of PR rtl-optimization/7061
(a four digit PR number) from 2006 where on x86_64 complex number arguments
are unconditionally spilled to the stack.

For the test cases below:
float re(float _Complex a) { return __real__ a; }
float im(float _Complex a) { return __imag__ a; }

GCC with -O2 currently generates:

re:	movq    %xmm0, -8(%rsp)
        movss   -8(%rsp), %xmm0
        ret
im:	movq    %xmm0, -8(%rsp)
        movss   -4(%rsp), %xmm0
        ret

with this patch we now generate:

re:	ret
im:	movq    %xmm0, %rax
        shrq    $32, %rax
        movd    %eax, %xmm0
        ret

[Technically, this shift can be performed on %xmm0 in a single
instruction, but the backend needs to be taught to do that, the
important bit is that the SCmode argument isn't written to the
stack].

The patch itself is to emit_group_store where just before RTL
expansion commits to writing to the stack, we check if the store
group consists of a single scalar integer register that holds
a complex mode value; on x86_64 SCmode arguments are passed in
DImode registers.  If this is the case, we can use a SUBREG to
"view_convert" the integer to the equivalent complex mode.

An interesting corner case that showed up during testing is that
x86_64 also passes HCmode arguments in DImode registers(!), i.e.
using modes of different sizes.  This is easily handled/supported
by first converting to an integer mode of the correct size, and
then generating a complex mode SUBREG of this.  This is similar
in concept to the patch I proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html

2020-06-10  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR rtl-optimization/7061
	* expr.cc (emit_group_store): For groups that consist of a single
	scalar integer register that hold a complex mode value, use
	gen_lowpart to generate a SUBREG to "view_convert" to the complex
	mode.  For modes of different sizes, first convert to an integer
	mode of the appropriate size.

gcc/testsuite/ChangeLog
	PR rtl-optimization/7061
	* gcc.target/i386/pr7061-1.c: New test case.
	* gcc.target/i386/pr7061-2.c: New test case.
ecatmur pushed a commit that referenced this pull request May 17, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X##2;		\
    template <class U> struct X##3;		\
    template <class U> struct X##4;		\
    template <class U> struct X##5;		\
    template <class U> struct X##6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants