Skip to content

Conversation

@sushantchry
Copy link

Pulling for study purpose, no changes expected

jakub and others added 30 commits March 20, 2013 09:01
See http://openmp.org/wp/2013/03/openmp-40-rc2/ for the standard
draft.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196809 138bc75d-0d04-0410-961f-82ee72b054a4
	Add another argument to c_finish_omp_atomic.

	* parser.c (cp_parser_binary_expression): Handle no_toplevel_fold_p
	even for binary operations other than comparison.
	(cp_parser_omp_atomic): Handle parsing OpenMP 4.0 atomics.
	* pt.c (tsubst_expr) <case OMP_ATOMIC>: Handle atomic exchange.
	* semantics.c (finish_omp_atomic): Use cp_tree_equal to diagnose
	expression mismatches and to find out if c_finish_omp_atomic
	should be called with swapped set to true or false.

	* c-omp.c (c_finish_omp_atomic): Add swapped argument, if true,
	build the operation first with rhs, lhs arguments and use NOP_EXPR
	build_modify_expr.
	* c-common.h (c_finish_omp_atomic): Adjust prototype.

	* c-c++-common/gomp/atomic-15.c: Remove error test that is now
	valid in OpenMP 4.0.

	* testsuite/libgomp.c++/atomic-10.C: New test.
	* testsuite/libgomp.c++/atomic-11.C: New test.
	* testsuite/libgomp.c++/atomic-12.C: New test.
	* testsuite/libgomp.c++/atomic-13.C: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196815 138bc75d-0d04-0410-961f-82ee72b054a4
	with default value, pass it down to c_parser_conditional_expression.
	(c_parser_conditional_expression): Add omp_atomic_lhs argument, pass
	it down to c_parser_binary_expression.  Don't pass PREC_NONE to
	it.  Adjust recursive call.
	(c_parser_binary_expression): Remove prec argument, add omp_atomic_lhs
	argument.  Always start from PREC_NONE, if omp_atomic_lhs is non-NULL
	and one of the arguments of toplevel binop matches it, use build2
	instead of parser_build_binary_op.
	(c_parser_omp_atomic): Handle OpenMP 4.0 atomics.
	(c_parser_omp_for_loop): Adjust c_parser_binary_expression caller.
	* c-tree.h (c_tree_equal): New prototype.
	* c-typeck.c (c_tree_equal): New function.

	* parser.c (cp_parser_omp_atomic): Never restart unless
	structured_block is true.

	* c-c++-common/gomp/atomic-15.c: Adjust for C diagnostics.

	* testsuite/libgomp.c/atomic-14.c: Add parens to make it valid.
	* testsuite/libgomp.c/atomic-15.c: New test.
	* testsuite/libgomp.c/atomic-16.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196816 138bc75d-0d04-0410-961f-82ee72b054a4
        * env.c (handle_omp_display_env): New function.
        (initialize_env): Use it.



git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196817 138bc75d-0d04-0410-961f-82ee72b054a4
        * libgomp.texi (Environment Variables): Minor cleanup,
        update section refs to OpenMP 4.0rc2.
        (OMP_DISPLAY_ENV, GOMP_SPINCOUNT): Document these
        environment variables.



git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196818 138bc75d-0d04-0410-961f-82ee72b054a4
	GIMPLE_OMP_FOR kinds.
	* tree.def (OMP_SIMD, OMP_FOR_SIMD, OMP_DISTRIBUTE): New tree codes.
	* gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_MASK,
	GF_OMP_FOR_KIND_FOR, GF_OMP_FOR_KIND_SIMD, GF_OMP_FOR_KIND_FOR_SIMD
	and GF_OMP_FOR_KIND_DISTRIBUTE.
	(gimple_omp_for_kind, gimple_omp_for_set_kind): New inline functions.
	* gimplify.c (is_gimple_stmt, gimplify_omp_for, gimplify_expr): Handle
	OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE.
	* tree.c (omp_clause_num_ops, omp_clause_code_name, walk_tree_1):
	Handle new OpenMP 4.0 clauses.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	(dump_generic_node): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE.
	* tree.h (enum omp_clause_code): Add OMP_CLAUSE_LINEAR,
	OMP_CLAUSE_ALIGNED, OMP_CLAUSE_DEPEND, OMP_CLAUSE_FROM, OMP_CLAUSE_TO,
	OMP_CLAUSE_UNIFORM, OMP_CLAUSE_MAP, OMP_CLAUSE_DEVICE,
	OMP_CLAUSE_DIST_SCHEDULE, OMP_CLAUSE_INBRANCH, OMP_CLAUSE_NOTINBRANCH,
	OMP_CLAUSE_NUM_TEAMS, OMP_CLAUSE_PROC_BIND, OMP_CLAUSE_SAFELEN,
	OMP_CLAUSE_SIMDLEN, OMP_CLAUSE_FOR, OMP_CLAUSE_PARALLEL,
	OMP_CLAUSE_SECTIONS and OMP_CLAUSE_TASKGROUP.
	(OMP_LOOP_CHECK): Define.
	(OMP_FOR_BODY, OMP_FOR_CLAUSES, OMP_FOR_INIT, OMP_FOR_COND,
	OMP_FOR_INCR, OMP_FOR_PRE_BODY): Use OMP_LOOP_CHECK instead of
	OMP_FOR_CHECK.
	(OMP_CLAUSE_DECL): Extend check range up to OMP_CLAUSE_MAP.
	(OMP_CLAUSE_LINEAR_STEP, OMP_CLAUSE_ALIGNED_ALIGNMENT,
	OMP_CLAUSE_NUM_TEAMS_EXPR, OMP_CLAUSE_DEVICE_ID,
	OMP_CLAUSE_DIST_SCHEDULE_CHUNK_EXPR, OMP_CLAUSE_SAFELEN_EXPR,
	OMP_CLAUSE_SIMDLEN_EXPR): Define.
	(enum omp_clause_depend_kind, enum omp_clause_map_kind,
	enum omp_clause_proc_bind_kind): New enums.
	(OMP_CLAUSE_DEPEND_KIND, OMP_CLAUSE_MAP_KIND,
	OMP_CLAUSE_PROC_BIND_KIND): Define.
	(struct tree_omp_clause): Add subcode.depend_kind, subcode.map_kind
	and subcode.proc_bind_kind.
	(find_omp_clause): New prototype.
	* omp-builtins.def (BUILT_IN_GOMP_CANCEL,
	BUILT_IN_GOMP_CANCELLATION_POINT): New built-ins.
	* tree-flow.h (find_omp_clause): Remove prototype.
c/
	* c-parser.c (c_parser_omp_all_clauses): Change mask argument type
	from unsigned to omp_clause_mask.
	(c_parser_omp_for_loop): Adjust c_finish_omp_for caller.
	(OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK,
	OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK,
	OMP_TASK_CLAUSE_MASK): Use OMP_CLAUSE_MASK_1 instead of 1.
	(c_parser_omp_parallel): Use omp_clause_mask type instead of unsigned
	for mask, use OMP_CLAUSE_MASK_1 instead of 1 for masks.
cp/
	* cp-tree.h (OMP_FOR_GIMPLIFYING_P): Use OMP_LOOP_CHECK instead of
	OMP_FOR_CHECK.
	(finish_omp_for): Add enum tree_code second argument.
	(finish_omp_cancel, finish_omp_cancellation_point): New prototypes.
	* cp-gimplify.c (cp_gimplify_expr, cp_genericize_r): Handle
	OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE.
	* semantics.c (finish_omp_clauses): Handle new OpenMP 4.0 clauses.
	(finish_omp_for): Add code argument, pass it down to make_node
	or c_finish_omp_for.
	(finish_omp_cancel, finish_omp_cancellation_point): New functions.
	* parser.c (cp_parser_omp_clause_name): Add parsing of new
	OpenMP 4.0 clauses.
	(cp_parser_omp_var_list_no_open): Add COLON argument, if non-NULL,
	accept termination by colon instead of closing paren.
	(cp_parser_omp_var_list, cp_parser_omp_clause_reduction): Adjust
	callers.
	(cp_parser_omp_clause_branch, cp_parser_omp_clause_cancelkind,
	cp_parser_omp_clause_num_teams, cp_parser_omp_clause_aligned,
	cp_parser_omp_clause_linear, cp_parser_omp_clause_depend,
	cp_parser_omp_clause_map, cp_parser_omp_clause_device,
	cp_parser_omp_clause_dist_schedule, cp_parser_omp_clause_proc_bind):
	New functions.
	(cp_parser_omp_all_clauses): Change mask argument's type to
	omp_clause_mask from unsigned.  Fix c_name for
	PRAGMA_OMP_CLAUSE_UNTIED.  Handle new OpenMP 4.0 clauses.
	(cp_parser_omp_for_loop): Add code argument.  Pass it down to
	finish_omp_for.
	(OMP_SIMD_CLAUSE_MASK): Define.
	(cp_parser_omp_simd): New function.
	(OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK,
	OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK,
	OMP_TASK_CLAUSE_MASK): Use OMP_CLAUSE_MASK_1 instead of 1.
	(cp_parser_omp_for): Handle parsing of #pragma omp for simd.
	(cp_parser_omp_parallel): Handle parsing of
	#pragma omp parallel for simd.  Use omp_clause_mask type
	instead of unsigned for mask, use OMP_CLAUSE_MASK_1 instead
	of 1 for masks.
	(OMP_CANCEL_CLAUSE_MASK, OMP_CANCELLATION_POINT_CLAUSE_MASK): Define.
	(cp_parser_omp_cancel, cp_parser_omp_cancellation_point): New
	functions.
	(cp_parser_omp_construct): Handle PRAGMA_OMP_SIMD, PRAGMA_OMP_CANCEL
	and PRAGMA_OMP_CANCELLATION_POINT.
	(cp_parser_pragma): Handle PRAGMA_OMP_SIMD.
	* pt.c (tsubst_expr): Handle OMP_SIMD, OMP_FOR_SIMD and
	OMP_DISTRIBUTE.  Pass down TREE_CODE to finish_omp_for.
fortran/
	* f95-lang.c (ATTR_NULL): Define.
c-family/
	* c-omp.c (c_finish_omp_for): Add code argument, pass it down to
	make_code.
	(c_split_parallel_clauses): Handle OMP_CLAUSE_SAFELEN,
	OMP_CLAUSE_ALIGNED and OMP_CLAUSE_LINEAR.
	* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_CANCEL,
	PRAGMA_OMP_CANCELLATION_POINT, PRAGMA_OMP_DECLARE_REDUCTION,
	PRAGMA_OMP_DECLARE_SIMD, PRAGMA_OMP_DECLARE_TARGET,
	PRAGMA_OMP_DISTRIBUTE, PRAGMA_OMP_END_DECLARE_TARGET,
	PRAGMA_OMP_FOR_SIMD, PRAGMA_OMP_PARALLEL_FOR_SIMD, PRAGMA_OMP_SIMD,
	PRAGMA_OMP_TARGET, PRAGMA_OMP_TARGET_DATA, PRAGMA_OMP_TARGET_UPDATE,
	PRAGMA_OMP_TASKGROUP and PRAGMA_OMP_TEAMS.
	(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_ALIGNED,
	PRAGMA_OMP_CLAUSE_DEPEND, PRAGMA_OMP_CLAUSE_DEVICE,
	PRAGMA_OMP_CLAUSE_DIST_SCHEDULE, PRAGMA_OMP_CLAUSE_FOR,
	PRAGMA_OMP_CLAUSE_FROM, PRAGMA_OMP_CLAUSE_INBRANCH,
	PRAGMA_OMP_CLAUSE_LINEAR, PRAGMA_OMP_CLAUSE_MAP,
	PRAGMA_OMP_CLAUSE_NOTINBRANCH, PRAGMA_OMP_CLAUSE_NUM_TEAMS,
	PRAGMA_OMP_CLAUSE_PARALLEL, PRAGMA_OMP_CLAUSE_PROC_BIND,
	PRAGMA_OMP_CLAUSE_SAFELEN, PRAGMA_OMP_CLAUSE_SECTIONS,
	PRAGMA_OMP_CLAUSE_SIMDLEN, PRAGMA_OMP_CLAUSE_TASKGROUP,
	PRAGMA_OMP_CLAUSE_TO and PRAGMA_OMP_CLAUSE_UNIFORM.
	* c-pragma.c (omp_pragmas): Add new OpenMP 4.0 constructs.
	* c-common.h (c_finish_omp_for): Add enum tree_code as second
	argument.
	(OMP_CLAUSE_MASK_1): Define.
	(omp_clause_mask): For HWI >= 64 new typedef for
	unsigned HOST_WIDE_INT, otherwise a class with needed ctors and
	operators.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197161 138bc75d-0d04-0410-961f-82ee72b054a4
	OMP_SIMD and OMP_FOR_SIMD loops.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197515 138bc75d-0d04-0410-961f-82ee72b054a4
	omp_get_proc_bind, omp_get_proc_bind_, omp_set_default_device,
	omp_set_default_device_, omp_set_default_device_8_,
	omp_get_default_device, omp_get_default_device_,
	omp_get_num_devices, omp_get_num_devices_, omp_get_num_teams,
	omp_get_num_teams_, omp_get_team_num, omp_get_team_num_): Export
	@@OMP_4.0.
	(GOMP_cancel, GOMP_cancellation_point, GOMP_parallel_loop_dynamic,
	GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime,
	GOMP_parallel_loop_static, GOMP_parallel_sections, GOMP_parallel,
	GOMP_taskgroup_start, GOMP_taskgroup_end): Export @@GOMP_4.0.
	* parallel.c (GOMP_parallel_end): Add ialias.
	(GOMP_parallel, GOMP_cancel, GOMP_cancellation_point): New
	functions.
	* omp.h.in (omp_proc_bind_t): New typedef.
	(omp_get_cancellation, omp_get_proc_bind, omp_set_default_device,
	omp_get_default_device, omp_get_num_devices, omp_get_num_teams,
	omp_get_team_num): New prototypes.
	* env.c (omp_get_cancellation, omp_get_proc_bind,
	omp_set_default_device, omp_get_default_device, omp_get_num_devices,
	omp_get_num_teams, omp_get_team_num): New functions.
	* fortran.c (ULP, STR1, STR2, ialias_redirect): Removed.
	(omp_get_cancellation_, omp_get_proc_bind_, omp_set_default_device_,
	omp_set_default_device_8_, omp_get_default_device_,
	omp_get_num_devices_, omp_get_num_teams_, omp_get_team_num_): New
	functions.
	* libgomp.h (ialias_ulp, ialias_str1, ialias_str2, ialias_redirect,
	ialias_call): Define.
	* libgomp_g.h (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic,
	GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime, GOMP_parallel,
	GOMP_cancel, GOMP_cancellation_point, GOMP_taskgroup_start,
	GOMP_taskgroup_end, GOMP_parallel_sections): New prototypes.
	* task.c (GOMP_taskgroup_start, GOMP_taskgroup_end): New functions.
	* sections.c (GOMP_parallel_sections): New function.
	* loop.c (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic,
	GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime): New
	functions.
	(GOMP_parallel_end): Add ialias_redirect.
	* omp_lib.f90.in (omp_proc_bind_kind, omp_proc_bind_false,
	omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close,
	omp_proc_bind_spread): New params.
	(omp_get_cancellation, omp_get_proc_bind, omp_set_default_device,
	omp_get_default_device, omp_get_num_devices, omp_get_num_teams,
	omp_get_team_num): New interfaces.
	* omp_lib.h.in (omp_proc_bind_kind, omp_proc_bind_false,
	omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close,
	omp_proc_bind_spread): New params.
	(omp_get_cancellation, omp_get_proc_bind, omp_set_default_device,
	omp_get_default_device, omp_get_num_devices, omp_get_num_teams,
	omp_get_team_num): New externals.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197670 138bc75d-0d04-0410-961f-82ee72b054a4
	(BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove.
	(BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New.
	* gimplify.c (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses):
	Handle OMP_CLAUSE_PROC_BIND.
	* omp-builtins.def (BUILT_IN_GOMP_TASKGROUP_START,
	BUILT_IN_GOMP_TASKGROUP_END, BUILT_IN_GOMP_PARALLEL_LOOP_STATIC,
	BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC,
	BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED,
	BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME, BUILT_IN_GOMP_PARALLEL,
	BUILT_IN_GOMP_PARALLEL_SECTIONS): New built-ins.
	(BUILT_IN_GOMP_PARALLEL_LOOP_STATIC_START,
	BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC_START,
	BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED_START,
	BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME_START,
	BUILT_IN_GOMP_PARALLEL_START, BUILT_IN_GOMP_PARALLEL_END,
	BUILT_IN_GOMP_PARALLEL_SECTIONS_START): Remove.
	* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_PROC_BIND.
	(expand_parallel_call): Expand #pragma omp parallel* as
	calls to the new GOMP_parallel_* APIs without _start at the end,
	instead of GOMP_parallel_*_start followed by fn.omp_fn.N call,
	followed by GOMP_parallel_end.  Handle OMP_CLAUSE_PROC_BIND.
	* tree-ssa-alias.c (ref_maybe_used_by_call_p_1,
	call_may_clobber_ref_p_1): Handle BUILT_IN_GOMP_TASKGROUP_END
	instead of BUILT_IN_GOMP_PARALLEL_END.
c-family/
	* c-common.c (DEF_FUNCTION_TYPE_8): Define.
	* c-omp.c (c_split_parallel_clauses): Handle OMP_CLAUSE_PROC_BIND.
cp/
	* cp-tree.h (finish_omp_taskgroup): New prototype.
	* parser.c (cp_parser_omp_clause_proc_bind): Require ) instead of
	colon at the end of the clause.
	(cp_parser_omp_taskgroup): New function.
	(cp_parser_omp_construct, cp_parser_pragma): Handle
	PRAGMA_OMP_TASKGROUP.
	* semantics.c (finish_omp_taskgroup): New function.
fortran/
	* f95-lang.c (DEF_FUNCTION_TYPE_8): Define.
	* types.def (DEF_FUNCTION_TYPE_8): Document.
	(BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove.
	(BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New.
ada/
	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_8): Define.
lto/
	* lto-lang.c (DEF_FUNCTION_TYPE_8): Define.
testsuite/
	* gcc.dg/gomp/combined-1.c: Look for GOMP_parallel_loop_runtime
	instead of GOMP_parallel_loop_runtime_start.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197676 138bc75d-0d04-0410-961f-82ee72b054a4
	OMP_CLAUSE_LINEAR_NO_COPYOUT): Define.
	* omp-low.c (extract_omp_for_data): Handle #pragma omp simd.
	(build_outer_var_ref): For #pragma omp simd allow linear etc.
	clauses to bind even to private vars.
	(scan_sharing_clauses): Handle OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED
	and OMP_CLAUSE_SAFELEN.
	(lower_rec_input_clauses): Handle OMP_CLAUSE_LINEAR.  Don't emit
	a GOMP_barrier call for firstprivate/lastprivate in #pragma omp simd.
	(lower_lastprivate_clauses): Handle also OMP_CLAUSE_LINEAR.
	(expand_omp_simd): New function.
	(expand_omp_for): Handle #pragma omp simd.
	* gimplify.c (enum gimplify_omp_var_data): Add GOVD_LINEAR and
	GOVD_ALIGNED, add GOVD_LINEAR into GOVD_DATA_SHARE_CLASS.
	(enum omp_region_type): Add ORT_SIMD.
	(gimple_add_tmp_var, gimplify_var_or_parm_decl, omp_check_private,
	omp_firstprivatize_variable, omp_notice_variable): Handle ORT_SIMD
	like ORT_WORKSHARE.
	(omp_is_private): Likewise.  Add SIMD argument, tweak diagnostics
	and add extra errors in simd constructs.
	(gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses): Handle
	OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_SAFELEN.
	(gimplify_adjust_omp_clauses_1): Handle GOVD_LASTPRIVATE and
	GOVD_ALIGNED.
	(gimplify_omp_for): Handle #pragma omp simd.
cp/
	* cp-tree.h (CP_OMP_CLAUSE_INFO): Also allow it on OMP_CLAUSE_LINEAR.
	* parser.c (cp_parser_omp_var_list_no_open): If colon is non-NULL,
	temporarily disable colon_corrects_to_scope_p during the parsing
	of the variable list.
	(cp_parser_omp_clause_safelen, cp_parser_omp_clause_simdlen): New
	functions.
	(cp_parser_omp_all_clauses): Handle OMP_CLAUSE_SAFELEN and
	OMP_CLAUSE_SIMDLEN.
	* semantics.c (finish_omp_clauses): Allow NULL_TREE in
	OMP_CLAUSE_ALIGNED_ALIGNMENT.
testsuite/
	* c-c++-common/gomp/simd1.c: New test.
	* c-c++-common/gomp/simd2.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198092 138bc75d-0d04-0410-961f-82ee72b054a4
	* gimplify.c (gimplify_adjust_omp_clauses): For linear clauses
	if outer_context is non-NULL, but not ORT_COMBINED_PARALLEL,
	call omp_notice_variable.  Remove aligned clauses that can't
	be handled yet.
	* omp-low.c: Include target.h.
	(scan_sharing_clauses): For aligned clauses with global arrays
	register local replacement.
	(omp_clause_aligned_alignment): New function.
	(lower_rec_input_clauses): For aligned clauses for global
	arrays or automatic pointers emit __builtin_assume_aligned
	before the loop if possible.
	(expand_omp_regimplify_p, expand_omp_build_assign): New functions.
	(expand_omp_simd): Use them.  Handle pointer iterators and broken
	loops.
	(lower_omp_for): Call lower_omp on gimple_omp_body_ptr after
	calling lower_rec_input_clauses, not before it.
cp/
	* semantics.c (finish_omp_clauses): On OMP_CLAUSE_LINEAR clauses
	verify OMP_CLAUSE_DECL has integral or pointer type, and handle
	linear steps for pointer type decls.  FIx up handling of
	OMP_CLAUSE_UNIFORM.
testsuite/
	* c-c++-common/gomp/simd3.c: New test.
	* c-c++-common/gomp/simd4.c: New test.
	* c-c++-common/gomp/simd5.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198193 138bc75d-0d04-0410-961f-82ee72b054a4
c/
	* c-parser.c (c_parser_compound_statement,
	c_parser_statement): Adjust comments for OpenMP 3.0+
	additions.
	(c_parser_pragma): Handle PRAGMA_OMP_CANCEL and
	PRAGMA_OMP_CANCELLATION_POINT.
	(c_parser_omp_clause_name): Handle new OpenMP 4.0 clauses.
	(c_parser_omp_clause_collapse): Fully fold collapse
	expression.
	(c_parser_omp_clause_branch, c_parser_omp_clause_cancelkind,
	c_parser_omp_clause_num_teams, c_parser_omp_clause_aligned,
	c_parser_omp_clause_linear, c_parser_omp_clause_safelen,
	c_parser_omp_clause_simdlen, c_parser_omp_clause_depend,
	c_parser_omp_clause_map, c_parser_omp_clause_device,
	c_parser_omp_clause_dist_schedule, c_parser_omp_clause_proc_bind,
	c_parser_omp_clause_to, c_parser_omp_clause_from,
	c_parser_omp_clause_uniform): New functions.
	(c_parser_omp_all_clauses): Handle new OpenMP 4.0 clauses.
	(c_parser_omp_for_loop): Add CODE argument, pass it through
	to c_finish_omp_for.
	(OMP_SIMD_CLAUSE_MASK): Define.
	(c_parser_omp_simd): New function.
	(c_parser_omp_for): Parse #pragma omp for simd.
	(OMP_PARALLEL_CLAUSE_MASK): Add OMP_CLAUSE_PROC_BIND.
	(c_parser_omp_parallel): Parse #pragma omp parallel for simd.
	(OMP_TASK_CLAUSE_MASK): Add OMP_CLAUSE_DEPEND.
	(c_parser_omp_taskgroup): New function.
	(OMP_CANCEL_CLAUSE_MASK, OMP_CANCELLATION_POINT_CLAUSE_MASK): Define.
	(c_parser_omp_cancel, c_parser_omp_cancellation_point): New functions.
	(c_parser_omp_construct): Handle PRAGMA_OMP_SIMD and
	PRAGMA_OMP_TASKGROUP.
	(c_parser_transaction_cancel): Formatting fix.
	* c-tree.h (c_begin_omp_taskgroup, c_finish_omp_taskgroup,
	c_finish_omp_cancel, c_finish_omp_cancellation_point): New prototypes.
	* c-typeck.c (c_begin_omp_taskgroup, c_finish_omp_taskgroup,
	c_finish_omp_cancel, c_finish_omp_cancellation_point): New functions.
	(c_finish_omp_clauses): Handle new OpenMP 4.0 clauses.
cp/
	* parser.c (cp_parser_omp_clause_name): Add missing break after
	case 'i'.
	(cp_parser_omp_cancellation_point): Diagnose error if
	#pragma omp cancellation isn't followed by point.
	* semantics.c (finish_omp_clauses): Complain also about zero
	in alignment of aligned directive or safelen/simdlen expressions.
	(finish_omp_cancel): Fix up diagnostics wording.
testsuite/
	* c-c++-common/gomp/simd1.c: Enable also for C.
	* c-c++-common/gomp/simd2.c: Likewise.
	* c-c++-common/gomp/simd3.c: Likewise.
	* c-c++-common/gomp/simd4.c: Likewise.  Adjust expected
	diagnostics for C.
	* c-c++-common/gomp/simd5.c: Enable also for C.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198264 138bc75d-0d04-0410-961f-82ee72b054a4
	OpenMP constructs nested inside simd region.  Don't treat
	#pragma omp simd as work-sharing region.  Disallow work-sharing
	constructs inside of critical region.  Complain if ordered
	region is nested inside of parallel region without loop
	region in between.
	(scan_omp_1_stmt): Call check_omp_nesting_restrictions even
	for GOMP_{cancel{,lation_point},taskyield,taskwait} calls.

	* gfortran.dg/gomp/appendix-a/a.35.5.f90: Add dg-error.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198459 138bc75d-0d04-0410-961f-82ee72b054a4
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198460 138bc75d-0d04-0410-961f-82ee72b054a4
	dump_gimple_omp_atomic_store): Handle gimple_omp_atomic_seq_cst_p.
	* gimple.h (enum gf_mask): Add GF_OMP_ATOMIC_SEQ_CST.
	(gimple_omp_atomic_set_seq_cst, gimple_omp_atomic_seq_cst_p): New
	inline functions.
	* omp-low.c (expand_omp_atomic_load, expand_omp_atomic_store,
	expand_omp_atomic_fetch_op): If gimple_omp_atomic_seq_cst_p,
	pass MEMMODEL_SEQ_CST instead of MEMMODEL_RELAXED to the builtin.
	* gimplify.c (gimplify_omp_atomic): Handle OMP_ATOMIC_SEQ_CST.
	* tree-pretty-print.c (dump_generic_node): Handle OMP_ATOMIC_SEQ_CST.
	* tree.def (OMP_ATOMIC): Add comment that OMP_ATOMIC* must stay
	consecutive.
	* tree.h (OMP_ATOMIC_SEQ_CST): Define.
c/
	* c-parser.c (c_parser_omp_atomic): Parse seq_cst clause, pass
	true if it is present to c_finish_omp_atomic.
cp/
	* pt.c (tsubst_expr): Pass OMP_ATOMIC_SEQ_CST to finish_omp_atomic.
	* semantics.c (finish_omp_atomic): Add seq_cst argument, pass
	it through to c_finish_omp_atomic or store into OMP_ATOMIC_SEQ_CST.
	* cp-tree.h (finish_omp_atomic): Adjust prototype.
	* parser.c (cp_parser_omp_atomic): Parse seq_cst clause, pass
	true if it is present to finish_omp_atomic.
c-family/
	* c-omp.c (c_finish_omp_atomic): Add seq_cst argument, store it
	into OMP_ATOMIC_SEQ_CST bit.
	* c-common.h (c_finish_omp_atomic): Adjust prototype.
testsuite/
	* testsuite/libgomp.c/atomic-17.c: New test.
	* testsuite/libgomp.c++/atomic-14.C: New test.
	* testsuite/libgomp.c++/atomic-15.C: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198461 138bc75d-0d04-0410-961f-82ee72b054a4
Remove deprecated vectorlength clause features.

Remove deprecated assert and noassert clauses.

Implement vectorlength clause in OpenMP safelen terms.
	(attribute_value_equal): Call it for -fopenmp if
	TREE_VALUE of the attributes are both OMP_CLAUSEs.
	* tree.h (omp_declare_simd_clauses_equal): Declare.
c-family/
	* c-common.c (c_common_attribute_table): Add "omp declare simd"
	attribute.
	(handle_omp_declare_simd_attribute): New function.
	* c-common.h (c_omp_declare_simd_clauses_to_numbers,
	c_omp_declare_simd_clauses_to_decls): Declare.
	* c-omp.c (c_omp_declare_simd_clause_cmp,
	c_omp_declare_simd_clauses_to_numbers,
	c_omp_declare_simd_clauses_to_decls): New functions.
cp/
	* cp-tree.h (cp_decl_specifier_seq): Add omp_declare_simd_clauses
	field.
	(finish_omp_declare_simd): Declare.
	* decl2.c (is_late_template_attribute): Return true for
	"omp declare simd" attribute.
	(cp_check_const_attributes): Don't check TREE_VALUE of arg if
	arg isn't a TREE_LIST.
	* decl.c (grokfndecl): Add omp_declare_simd_clauses argument, call
	finish_omp_declare_simd if non-NULL.
	(grokdeclarator): Pass it declspecs->omp_declare_simd_clauses
	to grokfndecl.
	* pt.c (apply_late_template_attributes): Handle "omp declare simd"
	attribute specially.
	(tsubst_omp_clauses): Add declare_simd argument, don't call
	finish_omp_clauses if it is set.  Handle OpenMP 4.0 clauses.
	(tsubst_expr): Adjust tsubst_omp_clauses callers.
	* semantics.c (finish_omp_clauses): Diagnose inbranch notinbranch.
	(finish_omp_declare_simd): New function.
	* parser.h (struct cp_parser): Add omp_declare_simd_clauses field.
	* parser.c (cp_ensure_no_omp_declare_simd,
	cp_finish_omp_declare_simd): New functions.
	(enum pragma_context): Add pragma_member and pragma_objc_icode.
	(cp_parser_linkage_specification, cp_parser_namespace_definition,
	cp_parser_class_specifier_1): Call cp_ensure_no_omp_declare_simd.
	(cp_parser_init_declarator, cp_parser_member_declaration,
	cp_parser_function_definition_from_specifiers_and_declarator,
	cp_parser_save_member_function_body): Copy
	parser->omp_declare_simd_clauses to
	decl_specifiers->omp_declare_simd_clauses, call
	cp_finish_omp_declare_simd.
	(cp_parser_member_specification_opt): Pass pragma_member instead
	of pragma_external to cp_parser_pragma.
	(cp_parser_objc_interstitial_code): Pass pragma_objc_icode instead
	of pragma_external to cp_parser_pragma.
	(cp_parser_omp_var_list_no_open): If parser->omp_declare_simd_clauses,
	just cp_parser_identifier the argument names.
	(cp_parser_omp_all_clauses): Don't call finish_omp_clauses for
	parser->omp_declare_simd_clauses.
	(OMP_DECLARE_SIMD_CLAUSE_MASK): Define.
	(cp_parser_omp_declare_simd, cp_parser_omp_declare): New functions.
	(cp_parser_pragma): Call cp_ensure_no_omp_declare_simd.  Handle
	PRAGMA_OMP_DECLARE_REDUCTION.  Replace == pragma_external with
	!= pragma_stmt and != pragma_compound.
testsuite/
	* g++.dg/gomp/declare-simd-1.C: New test.
	* g++.dg/gomp/declare-simd-2.C: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198739 138bc75d-0d04-0410-961f-82ee72b054a4
	* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_LINEAR_STEP
	adjustments for pointer-types here.  Diagnose inbranch notinbranch
	being used together.
	(c_finish_omp_declare_simd): New function.
	* c-parser.c (enum pragma_context): Add pragma_struct and
	pragma_param.
	(c_parser_declaration_or_fndef): Add omp_declare_simd_clauses
	argument.  Call c_finish_omp_declare_simd if needed.
	(c_parser_external_declaration, c_parser_compound_statement_nostart,
	c_parser_label, c_parser_for_statement, c_parser_objc_methodprotolist,
	c_parser_omp_for_loop): Adjust c_parser_declaration_or_fndef callers.
	(c_parser_struct_or_union_specifier): Use pragma_struct instead of
	pragma_external.
	(c_parser_parameter_declaration): Use pragma_param instead of
	pragma_external.
	(c_parser_pragma): Handle PRAGMA_OMP_DECLARE_REDUCTION.
	Replace == pragma_external with != pragma_stmt && != pragma_compound
	test.
	(c_parser_omp_variable_list): Add declare_simd argument.  Don't lookup
	vars if it is true, just store identifiers.
	(c_parser_omp_var_list_parens, c_parser_omp_clause_depend,
	c_parser_omp_clause_map): Adjust callers.
	(c_parser_omp_clause_reduction, c_parser_omp_clause_aligned): Add
	declare_simd argument, pass it through to c_parser_omp_variable_list.
	(c_parser_omp_clause_linear): Likewise.  Don't handle
	OMP_CLAUSE_LINEAR_STEP adjustements for pointer-types here.
	(c_parser_omp_clause_uniform): Call c_parser_omp_variable_list
	instead of c_parser_omp_var_list_parens to pass true as declare_simd.
	(c_parser_omp_all_clauses): Add declare_simd argument, pass it through
	clause parsing routines as needed.  Don't call c_finish_omp_clauses if
	set.
	(c_parser_omp_simd, c_parser_omp_for, c_parser_omp_sections,
	c_parser_omp_parallel, c_parser_omp_single, c_parser_omp_task,
	c_parser_omp_cancel, c_parser_omp_cancellation_point): Adjust callers.
	(OMP_DECLARE_SIMD_CLAUSE_MASK): Define.
	(c_parser_omp_declare_simd, c_parser_omp_declare): New functions.

	* gcc.dg/gomp/declare-simd-1.c: New test.
	* gcc.dg/gomp/declare-simd-2.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198828 138bc75d-0d04-0410-961f-82ee72b054a4
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198835 138bc75d-0d04-0410-961f-82ee72b054a4
kraj pushed a commit to kraj/gcc that referenced this pull request May 2, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X#gcc-mirror#2;		\
    template <class U> struct X#gcc-mirror#3;		\
    template <class U> struct X#gcc-mirror#4;		\
    template <class U> struct X#gcc-mirror#5;		\
    template <class U> struct X#gcc-mirror#6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
rurban pushed a commit to rurban/gcc that referenced this pull request Oct 26, 2023
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
XYenChi referenced this pull request in XYenChi/gcc Nov 7, 2023
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A #4, R12 { RRAM.A #4, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
hubot pushed a commit that referenced this pull request Feb 16, 2024
Here we have

  template<class T>
  auto is_throwable(T t) -> decltype(throw t, true) { ... }

where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused
the wrong overload to have been chosen.  Jason figured out it's because
we don't correctly implement [expr.prim.id.unqual]#4.2, which post-P2266
says that an id-expression is move-eligible if

"the id-expression (possibly parenthesized) is the operand of
a throw-expression, and names an implicitly movable entity that belongs
to a scope that does not contain the compound-statement of the innermost
lambda-expression, try-block, or function-try-block (if any) whose
compound-statement or ctor-initializer contains the throw-expression."

I worked out that it's trying to say that given

  struct X {
    X();
    X(const X&);
    X(X&&) = delete;
  };

the following should fail: the scope of the throw is an sk_try, and it's
also x's scope S, and S "does not contain the compound-statement of the
*try-block" so x is move-eligible, so we move, so we fail.

  void f ()
  try {
    X x;
    throw x;  // use of deleted function
  } catch (...) {
  }

Whereas here:

  void g (X x)
  try {
    throw x;
  } catch (...) {
  }

the throw is again in an sk_try, but x's scope is an sk_function_parms
which *does* contain the {} of the *try-block, so x is not move-eligible,
so we don't move, so we use X(const X&), and the code is fine.

The current code also doesn't seem to handle

  void h (X x) {
    void z (decltype(throw x, true));
  }

where there's no enclosing lambda or sk_try so we should move.

I'm not doing anything about lambdas because we shouldn't reach the
code at the end of the function: the DECL_HAS_VALUE_EXPR_P check
shouldn't let us go further.

	PR c++/113789
	PR c++/113853

gcc/cp/ChangeLog:

	* typeck.cc (treat_lvalue_as_rvalue_p): Update code to better
	reflect [expr.prim.id.unqual]#4.2.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/sfinae69.C: Remove dg-bogus.
	* g++.dg/cpp0x/sfinae70.C: New test.
	* g++.dg/cpp0x/sfinae71.C: New test.
	* g++.dg/cpp0x/sfinae72.C: New test.
	* g++.dg/cpp2a/implicit-move4.C: New test.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 19, 2024
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X#gcc-mirror#1;		\
    template <class U> struct X#gcc-mirror#2;		\
    template <class U> struct X#gcc-mirror#3;		\
    template <class U> struct X#gcc-mirror#4;		\
    template <class U> struct X#gcc-mirror#5;		\
    template <class U> struct X#gcc-mirror#6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 19, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 19, 2024
Here we have

  template<class T>
  auto is_throwable(T t) -> decltype(throw t, true) { ... }

where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused
the wrong overload to have been chosen.  Jason figured out it's because
we don't correctly implement [expr.prim.id.unqual]gcc-mirror#4.2, which post-P2266
says that an id-expression is move-eligible if

"the id-expression (possibly parenthesized) is the operand of
a throw-expression, and names an implicitly movable entity that belongs
to a scope that does not contain the compound-statement of the innermost
lambda-expression, try-block, or function-try-block (if any) whose
compound-statement or ctor-initializer contains the throw-expression."

I worked out that it's trying to say that given

  struct X {
    X();
    X(const X&);
    X(X&&) = delete;
  };

the following should fail: the scope of the throw is an sk_try, and it's
also x's scope S, and S "does not contain the compound-statement of the
*try-block" so x is move-eligible, so we move, so we fail.

  void f ()
  try {
    X x;
    throw x;  // use of deleted function
  } catch (...) {
  }

Whereas here:

  void g (X x)
  try {
    throw x;
  } catch (...) {
  }

the throw is again in an sk_try, but x's scope is an sk_function_parms
which *does* contain the {} of the *try-block, so x is not move-eligible,
so we don't move, so we use X(const X&), and the code is fine.

The current code also doesn't seem to handle

  void h (X x) {
    void z (decltype(throw x, true));
  }

where there's no enclosing lambda or sk_try so we should move.

I'm not doing anything about lambdas because we shouldn't reach the
code at the end of the function: the DECL_HAS_VALUE_EXPR_P check
shouldn't let us go further.

	PR c++/113789
	PR c++/113853

gcc/cp/ChangeLog:

	* typeck.cc (treat_lvalue_as_rvalue_p): Update code to better
	reflect [expr.prim.id.unqual]gcc-mirror#4.2.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/sfinae69.C: Remove dg-bogus.
	* g++.dg/cpp0x/sfinae70.C: New test.
	* g++.dg/cpp0x/sfinae71.C: New test.
	* g++.dg/cpp0x/sfinae72.C: New test.
	* g++.dg/cpp2a/implicit-move4.C: New test.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 21, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 25, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 25, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
NinaRanns referenced this pull request in NinaRanns/gcc May 30, 2024
fixing tests and removing C++20 requirement
hubot pushed a commit that referenced this pull request Jun 13, 2024
Here during overload resolution we have two strictly viable ambiguous
candidates #1 and #2, and two non-strictly viable candidates #3 and #4
which we hold on to ever since r14-6522.  These latter candidates have
an empty second arg conversion since the first arg conversion was deemed
bad, and this trips up joust when called on #3 and #4 which assumes all
arg conversions are there.

We can fix this by making joust robust to empty arg conversions, but in
this situation we shouldn't need to compare #3 and #4 at all given that
we have a strictly viable candidate.  To that end, this patch makes
tourney shortcut considering non-strictly viable candidates upon
encountering ambiguity between two strictly viable candidates (taking
advantage of the fact that the candidates list is sorted according to
viability via splice_viable).

	PR c++/115239

gcc/cp/ChangeLog:

	* call.cc (tourney): Don't consider a non-strictly viable
	candidate as the champ if there was ambiguity between two
	strictly viable candidates.

gcc/testsuite/ChangeLog:

	* g++.dg/overload/error7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
hubot pushed a commit that referenced this pull request Jun 17, 2024
Here during overload resolution we have two strictly viable ambiguous
candidates #1 and #2, and two non-strictly viable candidates #3 and #4
which we hold on to ever since r14-6522.  These latter candidates have
an empty second arg conversion since the first arg conversion was deemed
bad, and this trips up joust when called on #3 and #4 which assumes all
arg conversions are there.

We can fix this by making joust robust to empty arg conversions, but in
this situation we shouldn't need to compare #3 and #4 at all given that
we have a strictly viable candidate.  To that end, this patch makes
tourney shortcut considering non-strictly viable candidates upon
encountering ambiguity between two strictly viable candidates (taking
advantage of the fact that the candidates list is sorted according to
viability via splice_viable).

	PR c++/115239

gcc/cp/ChangeLog:

	* call.cc (tourney): Don't consider a non-strictly viable
	candidate as the champ if there was ambiguity between two
	strictly viable candidates.

gcc/testsuite/ChangeLog:

	* g++.dg/overload/error7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 7fed7e9)
hubot pushed a commit that referenced this pull request Jul 19, 2024
These tests used to generate:

        bl      swap
        ldr     r2, [sp, #4]
        mov     r0, r2  @ __fp16

but g:9d20529d94b23275885f380d155fe8671ab5353a means that we can
load directly into r0:

        bl      swap
        ldrh    r0, [sp, #4]    @ __fp16

This patch updates the tests to "defend" this change.

While there, the scans include:

mov\tr1, r[03]}

But if the spill of r2 occurs first, there's no real reason why
r2 couldn't be used as the temporary, instead r3.

The patch tries to update the scans while preserving the spirit
of the originals.

gcc/testsuite/
	* gcc.target/arm/fp16-aapcs-2.c: Expect the return value to be
	loaded directly from the stack.  Test that the swap generates
	two moves out of r0/r1 and two moves in.
	* gcc.target/arm/fp16-aapcs-4.c: Likewise.
hubot pushed a commit that referenced this pull request Sep 7, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.
mikpe added a commit to mikpe/gcc that referenced this pull request Sep 8, 2024
hubot pushed a commit that referenced this pull request Sep 12, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.

(cherry picked from commit bb8dd09)
hubot pushed a commit that referenced this pull request Sep 13, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.

(cherry picked from commit bb8dd09)
hubot pushed a commit that referenced this pull request Oct 9, 2024
Whenever C1 and C2 are integer constants, X is of a wrapping type, and
cmp is a relational operator, the expression X +- C1 cmp C2 can be
simplified in the following cases:

(a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial
comparison in the following way:
   X +- C1 <= C2
   -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1)
   -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions)
   -INF -+ C1 <= X <= +INF (due to (1))
   -INF -+ C1 <= X (eliminate the right hand side since it holds for any X)

(b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following
sequence of transformations:

   X +- C1 >= C2
   +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1)
   +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions)
   +INF -+ C1 >= X >= -INF (due to (1))
   +INF -+ C1 >= X (eliminate the right hand side since it holds for any X)

(c) The > and < cases are negations of (a) and (b), respectively.

This transformation allows to occasionally save add / sub instructions,
for instance the expression

3 + (uint32_t)f() < 2

compiles to

cmn     w0, #4
cset    w0, ls

instead of

add     w0, w0, 3
cmp     w0, 2
cset    w0, ls

on aarch64.

Testcases that go together with this patch have been split into two
separate files, one containing testcases for unsigned variables and the
other for wrapping signed ones (and thus compiled with -fwrapv).
Additionally, one aarch64 test has been adjusted since the patch has
caused the generated code to change from

cmn     w0, #2
csinc   w0, w1, wzr, cc   (x < -2)

to

cmn     w0, #3
csinc   w0, w1, wzr, cs   (x <= -3)

This patch has been bootstrapped and regtested on aarch64, x86_64, and
i386, and additionally regtested on riscv32.

gcc/ChangeLog:

	PR tree-optimization/116024
	* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/pr116024-2.c: New test.
	* gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto.
	* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.
hubot pushed a commit that referenced this pull request Nov 8, 2024
Update test case for armv8.1-m.main that supports conditional
arithmetic.

armv7-m:
        push    {r4, lr}
        ldr     r4, .L6
        ldr     r4, [r4]
        lsls    r4, r4, #29
        it      mi
        addmi   r2, r2, #1
        bl      bar
        movs    r0, #0
        pop     {r4, pc}

armv8.1-m.main:
        push    {r3, r4, r5, lr}
        ldr     r4, .L5
        ldr     r5, [r4]
        tst     r5, #4
        csinc   r2, r2, r2, eq
        bl      bar
        movs    r0, #0
        pop     {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

	* gcc.target/arm/epilog-1.c: Use check-function-bodies.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
hubot pushed a commit that referenced this pull request Nov 8, 2024
Update test case for armv8.1-m.main that supports conditional
arithmetic.

armv7-m:
        push    {r4, lr}
        ldr     r4, .L6
        ldr     r4, [r4]
        lsls    r4, r4, #29
        it      mi
        addmi   r2, r2, #1
        bl      bar
        movs    r0, #0
        pop     {r4, pc}

armv8.1-m.main:
        push    {r3, r4, r5, lr}
        ldr     r4, .L5
        ldr     r5, [r4]
        tst     r5, #4
        csinc   r2, r2, r2, eq
        bl      bar
        movs    r0, #0
        pop     {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

	* gcc.target/arm/epilog-1.c: Use check-function-bodies.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit ec86e87)
hubot pushed a commit that referenced this pull request Nov 26, 2024
In r14.2.0-376-g724446556e5, I accidentally introduced a regression in
the expected assembler as the csinc instruction was not used for
armv8.1-m.main.

The generated assembler for armv8.1-m.main is:
        push    {r3, r4, r5, lr}
        ldr     r4, .L5
        ldr     r5, [r4]
        adds    r4, r2, #1
        tst     r5, #4
        it      ne
        movne   r2, r4
        bl      bar
        movs    r0, #0
        pop     {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

	* gcc.target/arm/epilog-1.c: Corrected armv8.1.m-main asm.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
hubot pushed a commit that referenced this pull request Feb 5, 2025
When generating thumb2 code,
	LDM SP!, {PC}
is a two-byte instruction, whereas
	LDR PC, [SP], #4
is needs 4 bytes.  When optimizing for size, or when there's no obvious
performance benefit prefer the former.

gcc/ChangeLog:

	PR target/118089
	* config/arm/arm.cc (thumb2_expand_return): Use LDM SP!, {PC}
	when optimizing for size, or when there's no performance benefit over
	LDR PC, [SP], #4.
	(arm_expand_epilogue): Likewise.
hubot pushed a commit that referenced this pull request Feb 7, 2025
My earlier change for making the compiler prefer

	POP	{PC}

over

	LDR	PC, [SP], #4

had a slightly unexpected consequence in that we now also call
arm_emit_multi_reg_pop to handle single register pops when the
register is not PC.  This exposed a latent bug in this function where
the dwarf unwinding notes on the single-register POP were not being
set correctly.

gcc/
	PR target/118089
	* config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust
	note to single-register POP instructions.
hubot pushed a commit that referenced this pull request Jun 13, 2025
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.

(cherry picked from commit bb8dd09)
yfeldblum added a commit to yfeldblum/gcc that referenced this pull request Jul 18, 2025
When an exception is thrown and caught, destruction of the exception checks whether the exception was allocated in the `emergency_pool`, which is a global variable.

This global variable has a runtime constructor, which means access to it is valid only once the constructor has run during the module init phase.

But throwing and catching an exception is permitted at any time, not just during the lifetime of `main`. And this must be true whether libsupc++ is linked dynamically or statically.

LLVM Address Sanitizer aborts with `initialization-order-fiasco` when, in a binary which links libsupc++ statically, an exception is thrown and caught in some global constructor which happens to run prior to the global constructor of `emergency_pool`.

```
ERROR: AddressSanitizer: initialization-order-fiasco ...
READ of size 8 at ... thread T0
SCARINESS: 14 (8-byte-read-initialization-order-fiasco)
    #0 ... in (anonymous namespace)::pool::in_pool(void*) gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc:258
    gcc-mirror#1 ... in __cxa_free_exception gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc:302
    gcc-mirror#2 ... in __gxx_exception_cleanup(_Unwind_Reason_Code, _Unwind_Exception*) gcc-11.x/libstdc++-v3/libsupc++/eh_throw.cc:51
    gcc-mirror#3 ... in __cxa_end_catch gcc-11.x/libstdc++-v3/libsupc++/eh_catch.cc:125
    ...
    ... in __cxx_global_var_init ...
    ...
    ... in call_init.part.0 glibc-2.40/elf/dl-init.c:74:3
    ... in call_init glibc-2.40/elf/dl-init.c:120:14
    ... in _dl_init glibc-2.40/elf/dl-init.c:121:5
    ... in _dl_start_user glibc-2.40/elf/../sysdeps/aarch64/dl-start.S:46
... is located 56 bytes inside of global variable '(anonymous namespace)::emergency_pool' defined in 'gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc' (...) of size 72
  registered at:
    #0 ... in __asan_register_globals.part.0 llvm-project/compiler-rt/lib/asan/asan_globals.cpp:393:3
    gcc-mirror#1 ... in __asan_register_globals llvm-project/compiler-rt/lib/asan/asan_globals.cpp:392:3
    gcc-mirror#2 ... in __asan_register_elf_globals llvm-project/compiler-rt/lib/asan/asan_globals.cpp:376:26
    gcc-mirror#3 ... in call_init.part.0 glibc-2.40/elf/dl-init.c:74:3
    gcc-mirror#4 ... in call_init glibc-2.40/elf/dl-init.c:120:14
    gcc-mirror#5 ... in _dl_init glibc-2.40/elf/dl-init.c:121:5
    gcc-mirror#6 ... in _dl_start_user glibc-2.40/elf/../sysdeps/aarch64/dl-start.S:46
```
yfeldblum added a commit to yfeldblum/gcc that referenced this pull request Jul 18, 2025
When an exception is thrown and caught, destruction of the exception checks whether the exception was allocated in the `emergency_pool`, which is a global variable.

This global variable has a runtime constructor, which means access to it is valid only once the constructor has run during the module init phase.

But throwing and catching an exception is permitted at any time, not just during the lifetime of `main`. And this must be true whether libsupc++ is linked dynamically or statically.

LLVM Address Sanitizer aborts with `initialization-order-fiasco` when, in a binary which links libsupc++ statically, an exception is thrown and caught in some global constructor which happens to run prior to the global constructor of `emergency_pool`.

```
ERROR: AddressSanitizer: initialization-order-fiasco ...
READ of size 8 at ... thread T0
SCARINESS: 14 (8-byte-read-initialization-order-fiasco)
    #0 ... in (anonymous namespace)::pool::in_pool(void*) gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc:258
    gcc-mirror#1 ... in __cxa_free_exception gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc:302
    gcc-mirror#2 ... in __gxx_exception_cleanup(_Unwind_Reason_Code, _Unwind_Exception*) gcc-11.x/libstdc++-v3/libsupc++/eh_throw.cc:51
    gcc-mirror#3 ... in __cxa_end_catch gcc-11.x/libstdc++-v3/libsupc++/eh_catch.cc:125
    ...
    ... in __cxx_global_var_init ...
    ...
    ... in call_init.part.0 glibc-2.40/elf/dl-init.c:74:3
    ... in call_init glibc-2.40/elf/dl-init.c:120:14
    ... in _dl_init glibc-2.40/elf/dl-init.c:121:5
    ... in _dl_start_user glibc-2.40/elf/../sysdeps/aarch64/dl-start.S:46
... is located 56 bytes inside of global variable '(anonymous namespace)::emergency_pool' defined in 'gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc' (...) of size 72
  registered at:
    #0 ... in __asan_register_globals.part.0 llvm-project/compiler-rt/lib/asan/asan_globals.cpp:393:3
    gcc-mirror#1 ... in __asan_register_globals llvm-project/compiler-rt/lib/asan/asan_globals.cpp:392:3
    gcc-mirror#2 ... in __asan_register_elf_globals llvm-project/compiler-rt/lib/asan/asan_globals.cpp:376:26
    gcc-mirror#3 ... in call_init.part.0 glibc-2.40/elf/dl-init.c:74:3
    gcc-mirror#4 ... in call_init glibc-2.40/elf/dl-init.c:120:14
    gcc-mirror#5 ... in _dl_init glibc-2.40/elf/dl-init.c:121:5
    gcc-mirror#6 ... in _dl_start_user glibc-2.40/elf/../sysdeps/aarch64/dl-start.S:46
```
hubot pushed a commit that referenced this pull request Oct 15, 2025
The vadcq and vsbcq patterns had two problems:
- the adc / sbc part of the pattern did not mention the use of vfpcc
- the carry calcultation part should use a different unspec code

In addtion, the get_fpscr_nzcvqc and set_fpscr_nzcvqc were
over-cautious by using unspec_volatile when unspec is really what they
need.  Making them unspec enables to remove redundant accesses to
FPSCR_nzcvqc.

With unspec_volatile, we used to generate:
test_2:
	@ args = 0, pretend = 0, frame = 8
	@ frame_needed = 0, uses_anonymous_args = 0
	vmov.i32	q0, #0x1  @ v4si
	push	{lr}
	sub	sp, sp, #12
	vmrs	r3, FPSCR_nzcvqc    ;; [1]
	bic	r3, r3, #536870912
	vmsr	FPSCR_nzcvqc, r3
	vadc.i32	q3, q0, q0
	vmrs	r3, FPSCR_nzcvqc     ;; [2]
	vmrs	r3, FPSCR_nzcvqc
	orr	r3, r3, #536870912
	vmsr	FPSCR_nzcvqc, r3
	vadc.i32	q0, q0, q0
	vmrs	r3, FPSCR_nzcvqc
	ldr	r0, .L8
	ubfx	r3, r3, #29, #1
	str	r3, [sp, #4]
	bl	print_uint32x4_t
	add	sp, sp, #12
	@ sp needed
	pop	{pc}
.L9:
	.align	2
.L8:
	.word	.LC1

with unspec, we generate:
test_2:
	@ args = 0, pretend = 0, frame = 8
	@ frame_needed = 0, uses_anonymous_args = 0
	vmrs	r3, FPSCR_nzcvqc     ;; [1]
	bic	r3, r3, #536870912   ;; [3]
	vmov.i32	q0, #0x1  @ v4si
	vmsr	FPSCR_nzcvqc, r3
	vadc.i32	q3, q0, q0
	vmrs	r3, FPSCR_nzcvqc
	orr	r3, r3, #536870912
	vmsr	FPSCR_nzcvqc, r3
	vadc.i32	q0, q0, q0
	vmrs	r3, FPSCR_nzcvqc
	push	{lr}
	ubfx	r3, r3, #29, #1
	sub	sp, sp, #12
	ldr	r0, .L8
	str	r3, [sp, #4]
	bl	print_uint32x4_t
	add	sp, sp, #12
	@ sp needed
	pop	{pc}
.L9:
	.align	2
.L8:
	.word	.LC1

That is, unspec in get_fpscr_nzcvqc enables to:
- move [1] earlier
- delete redundant [2]

and unspec in set_fpscr_nzcvqc enables to move push {lr} and stack
manipulation later.

gcc/ChangeLog:

	PR target/122189
	* config/arm/iterators.md (VxCIQ_carry, VxCIQ_M_carry, VxCQ_carry)
	(VxCQ_M_carry): New iterators.
	* config/arm/mve.md (get_fpscr_nzcvqc, set_fpscr_nzcvqc): Use
	unspec instead of unspec_volatile.
	(vadciq, vadciq_m, vadcq, vadcq_m): Use vfpcc in operation.  Use a
	different unspec code for carry calcultation.
	* config/arm/unspecs.md (VADCQ_U_carry, VADCQ_M_U_carry)
	(VADCQ_S_carry, VADCQ_M_S_carry, VSBCIQ_U_carry ,VSBCIQ_S_carry
	,VSBCIQ_M_U_carry ,VSBCIQ_M_S_carry ,VSBCQ_U_carry ,VSBCQ_S_carry
	,VSBCQ_M_U_carry ,VSBCQ_M_S_carry ,VADCIQ_U_carry
	,VADCIQ_M_U_carry ,VADCIQ_S_carry ,VADCIQ_M_S_carry): New unspec
	codes.

gcc/testsuite/ChangeLog:

	PR target/122189
	* gcc.target/arm/mve/intrinsics/vadcq-check-carry.c: New test.
	* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Adjust instructions
	order.
	* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Likewise.
hubot pushed a commit that referenced this pull request Oct 22, 2025
The vectorizer has learned how to do boolean reductions of masks to a C bool
for the operations OR, XOR and AND.

This implements the new optabs for Adv.SIMD.  Adv.SIMD today can already
vectorize such loops but does so through SHIFT-AND-INSERT to perform the
reductions step-wise and inorder.  As an example, an OR reduction today does:

        movi    v3.4s, 0
        ext     v5.16b, v30.16b, v3.16b, #8
        orr     v5.16b, v5.16b, v30.16b
        ext     v29.16b, v5.16b, v3.16b, #4
        orr     v29.16b, v29.16b, v5.16b
        ext     v4.16b, v29.16b, v3.16b, #2
        orr     v4.16b, v4.16b, v29.16b
        ext     v3.16b, v4.16b, v3.16b, #1
        orr     v3.16b, v3.16b, v4.16b
        fmov    w1, s3
        and     w1, w1, 1

For reducing to a boolean however we don't need the stepwise reduction and can
just look at the bit patterns. For e.g. OR we now generate:

        umaxp	v3.4s, v3.4s, v3.4s
        fmov	x1, d3
        cmp	x1, 0
        cset	w0, ne

For the remaining codegen see test vect-reduc-bool-9.c.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (reduc_sbool_and_scal_<mode>,
	reduc_sbool_ior_scal_<mode>, reduc_sbool_xor_scal_<mode>): New.
	* config/aarch64/iterators.md (VALLI): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-reduc-bool-1.c: New test.
	* gcc.target/aarch64/vect-reduc-bool-2.c: New test.
	* gcc.target/aarch64/vect-reduc-bool-3.c: New test.
	* gcc.target/aarch64/vect-reduc-bool-4.c: New test.
	* gcc.target/aarch64/vect-reduc-bool-5.c: New test.
	* gcc.target/aarch64/vect-reduc-bool-6.c: New test.
	* gcc.target/aarch64/vect-reduc-bool-7.c: New test.
	* gcc.target/aarch64/vect-reduc-bool-8.c: New test.
	* gcc.target/aarch64/vect-reduc-bool-9.c: New test.
hubot pushed a commit that referenced this pull request Nov 12, 2025
The vadcq and vsbcq patterns had two problems:
- the adc / sbc part of the pattern did not mention the use of vfpcc
- the carry calcultation part should use a different unspec code

In addtion, the get_fpscr_nzcvqc and set_fpscr_nzcvqc were
over-cautious by using unspec_volatile when unspec is really what they
need.  Making them unspec enables to remove redundant accesses to
FPSCR_nzcvqc.

With unspec_volatile, we used to generate:
test_2:
	@ args = 0, pretend = 0, frame = 8
	@ frame_needed = 0, uses_anonymous_args = 0
	vmov.i32	q0, #0x1  @ v4si
	push	{lr}
	sub	sp, sp, #12
	vmrs	r3, FPSCR_nzcvqc    ;; [1]
	bic	r3, r3, #536870912
	vmsr	FPSCR_nzcvqc, r3
	vadc.i32	q3, q0, q0
	vmrs	r3, FPSCR_nzcvqc     ;; [2]
	vmrs	r3, FPSCR_nzcvqc
	orr	r3, r3, #536870912
	vmsr	FPSCR_nzcvqc, r3
	vadc.i32	q0, q0, q0
	vmrs	r3, FPSCR_nzcvqc
	ldr	r0, .L8
	ubfx	r3, r3, #29, #1
	str	r3, [sp, #4]
	bl	print_uint32x4_t
	add	sp, sp, #12
	@ sp needed
	pop	{pc}
.L9:
	.align	2
.L8:
	.word	.LC1

with unspec, we generate:
test_2:
	@ args = 0, pretend = 0, frame = 8
	@ frame_needed = 0, uses_anonymous_args = 0
	vmrs	r3, FPSCR_nzcvqc     ;; [1]
	bic	r3, r3, #536870912   ;; [3]
	vmov.i32	q0, #0x1  @ v4si
	vmsr	FPSCR_nzcvqc, r3
	vadc.i32	q3, q0, q0
	vmrs	r3, FPSCR_nzcvqc
	orr	r3, r3, #536870912
	vmsr	FPSCR_nzcvqc, r3
	vadc.i32	q0, q0, q0
	vmrs	r3, FPSCR_nzcvqc
	push	{lr}
	ubfx	r3, r3, #29, #1
	sub	sp, sp, #12
	ldr	r0, .L8
	str	r3, [sp, #4]
	bl	print_uint32x4_t
	add	sp, sp, #12
	@ sp needed
	pop	{pc}
.L9:
	.align	2
.L8:
	.word	.LC1

That is, unspec in get_fpscr_nzcvqc enables to:
- move [1] earlier
- delete redundant [2]

and unspec in set_fpscr_nzcvqc enables to move push {lr} and stack
manipulation later.

gcc/ChangeLog:

	PR target/122189
	* config/arm/iterators.md (VxCIQ_carry, VxCIQ_M_carry, VxCQ_carry)
	(VxCQ_M_carry): New iterators.
	* config/arm/mve.md (get_fpscr_nzcvqc, set_fpscr_nzcvqc): Use
	unspec instead of unspec_volatile.
	(vadciq, vadciq_m, vadcq, vadcq_m): Use vfpcc in operation.  Use a
	different unspec code for carry calcultation.
	* config/arm/unspecs.md (VADCQ_U_carry, VADCQ_M_U_carry)
	(VADCQ_S_carry, VADCQ_M_S_carry, VSBCIQ_U_carry ,VSBCIQ_S_carry
	,VSBCIQ_M_U_carry ,VSBCIQ_M_S_carry ,VSBCQ_U_carry ,VSBCQ_S_carry
	,VSBCQ_M_U_carry ,VSBCQ_M_S_carry ,VADCIQ_U_carry
	,VADCIQ_M_U_carry ,VADCIQ_S_carry ,VADCIQ_M_S_carry): New unspec
	codes.

gcc/testsuite/ChangeLog:

	PR target/122189
	* gcc.target/arm/mve/intrinsics/vadcq-check-carry.c: New test.
	* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Adjust instructions
	order.
	* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Likewise.

	(cherry picked from commits
	0272058 and
	697ccad)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants