Skip to content

Aldyh/cilk in gomp #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 79 commits into
base: master
Choose a base branch
from
Open

Aldyh/cilk in gomp #4

wants to merge 79 commits into from

Conversation

sushantchry
Copy link

Pulling for study purpose, no changes expected

jakub and others added 30 commits March 20, 2013 09:01
See http://openmp.org/wp/2013/03/openmp-40-rc2/ for the standard
draft.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196809 138bc75d-0d04-0410-961f-82ee72b054a4
	Add another argument to c_finish_omp_atomic.

	* parser.c (cp_parser_binary_expression): Handle no_toplevel_fold_p
	even for binary operations other than comparison.
	(cp_parser_omp_atomic): Handle parsing OpenMP 4.0 atomics.
	* pt.c (tsubst_expr) <case OMP_ATOMIC>: Handle atomic exchange.
	* semantics.c (finish_omp_atomic): Use cp_tree_equal to diagnose
	expression mismatches and to find out if c_finish_omp_atomic
	should be called with swapped set to true or false.

	* c-omp.c (c_finish_omp_atomic): Add swapped argument, if true,
	build the operation first with rhs, lhs arguments and use NOP_EXPR
	build_modify_expr.
	* c-common.h (c_finish_omp_atomic): Adjust prototype.

	* c-c++-common/gomp/atomic-15.c: Remove error test that is now
	valid in OpenMP 4.0.

	* testsuite/libgomp.c++/atomic-10.C: New test.
	* testsuite/libgomp.c++/atomic-11.C: New test.
	* testsuite/libgomp.c++/atomic-12.C: New test.
	* testsuite/libgomp.c++/atomic-13.C: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196815 138bc75d-0d04-0410-961f-82ee72b054a4
	with default value, pass it down to c_parser_conditional_expression.
	(c_parser_conditional_expression): Add omp_atomic_lhs argument, pass
	it down to c_parser_binary_expression.  Don't pass PREC_NONE to
	it.  Adjust recursive call.
	(c_parser_binary_expression): Remove prec argument, add omp_atomic_lhs
	argument.  Always start from PREC_NONE, if omp_atomic_lhs is non-NULL
	and one of the arguments of toplevel binop matches it, use build2
	instead of parser_build_binary_op.
	(c_parser_omp_atomic): Handle OpenMP 4.0 atomics.
	(c_parser_omp_for_loop): Adjust c_parser_binary_expression caller.
	* c-tree.h (c_tree_equal): New prototype.
	* c-typeck.c (c_tree_equal): New function.

	* parser.c (cp_parser_omp_atomic): Never restart unless
	structured_block is true.

	* c-c++-common/gomp/atomic-15.c: Adjust for C diagnostics.

	* testsuite/libgomp.c/atomic-14.c: Add parens to make it valid.
	* testsuite/libgomp.c/atomic-15.c: New test.
	* testsuite/libgomp.c/atomic-16.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196816 138bc75d-0d04-0410-961f-82ee72b054a4
        * env.c (handle_omp_display_env): New function.
        (initialize_env): Use it.



git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196817 138bc75d-0d04-0410-961f-82ee72b054a4
        * libgomp.texi (Environment Variables): Minor cleanup,
        update section refs to OpenMP 4.0rc2.
        (OMP_DISPLAY_ENV, GOMP_SPINCOUNT): Document these
        environment variables.



git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196818 138bc75d-0d04-0410-961f-82ee72b054a4
	GIMPLE_OMP_FOR kinds.
	* tree.def (OMP_SIMD, OMP_FOR_SIMD, OMP_DISTRIBUTE): New tree codes.
	* gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_MASK,
	GF_OMP_FOR_KIND_FOR, GF_OMP_FOR_KIND_SIMD, GF_OMP_FOR_KIND_FOR_SIMD
	and GF_OMP_FOR_KIND_DISTRIBUTE.
	(gimple_omp_for_kind, gimple_omp_for_set_kind): New inline functions.
	* gimplify.c (is_gimple_stmt, gimplify_omp_for, gimplify_expr): Handle
	OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE.
	* tree.c (omp_clause_num_ops, omp_clause_code_name, walk_tree_1):
	Handle new OpenMP 4.0 clauses.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	(dump_generic_node): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE.
	* tree.h (enum omp_clause_code): Add OMP_CLAUSE_LINEAR,
	OMP_CLAUSE_ALIGNED, OMP_CLAUSE_DEPEND, OMP_CLAUSE_FROM, OMP_CLAUSE_TO,
	OMP_CLAUSE_UNIFORM, OMP_CLAUSE_MAP, OMP_CLAUSE_DEVICE,
	OMP_CLAUSE_DIST_SCHEDULE, OMP_CLAUSE_INBRANCH, OMP_CLAUSE_NOTINBRANCH,
	OMP_CLAUSE_NUM_TEAMS, OMP_CLAUSE_PROC_BIND, OMP_CLAUSE_SAFELEN,
	OMP_CLAUSE_SIMDLEN, OMP_CLAUSE_FOR, OMP_CLAUSE_PARALLEL,
	OMP_CLAUSE_SECTIONS and OMP_CLAUSE_TASKGROUP.
	(OMP_LOOP_CHECK): Define.
	(OMP_FOR_BODY, OMP_FOR_CLAUSES, OMP_FOR_INIT, OMP_FOR_COND,
	OMP_FOR_INCR, OMP_FOR_PRE_BODY): Use OMP_LOOP_CHECK instead of
	OMP_FOR_CHECK.
	(OMP_CLAUSE_DECL): Extend check range up to OMP_CLAUSE_MAP.
	(OMP_CLAUSE_LINEAR_STEP, OMP_CLAUSE_ALIGNED_ALIGNMENT,
	OMP_CLAUSE_NUM_TEAMS_EXPR, OMP_CLAUSE_DEVICE_ID,
	OMP_CLAUSE_DIST_SCHEDULE_CHUNK_EXPR, OMP_CLAUSE_SAFELEN_EXPR,
	OMP_CLAUSE_SIMDLEN_EXPR): Define.
	(enum omp_clause_depend_kind, enum omp_clause_map_kind,
	enum omp_clause_proc_bind_kind): New enums.
	(OMP_CLAUSE_DEPEND_KIND, OMP_CLAUSE_MAP_KIND,
	OMP_CLAUSE_PROC_BIND_KIND): Define.
	(struct tree_omp_clause): Add subcode.depend_kind, subcode.map_kind
	and subcode.proc_bind_kind.
	(find_omp_clause): New prototype.
	* omp-builtins.def (BUILT_IN_GOMP_CANCEL,
	BUILT_IN_GOMP_CANCELLATION_POINT): New built-ins.
	* tree-flow.h (find_omp_clause): Remove prototype.
c/
	* c-parser.c (c_parser_omp_all_clauses): Change mask argument type
	from unsigned to omp_clause_mask.
	(c_parser_omp_for_loop): Adjust c_finish_omp_for caller.
	(OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK,
	OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK,
	OMP_TASK_CLAUSE_MASK): Use OMP_CLAUSE_MASK_1 instead of 1.
	(c_parser_omp_parallel): Use omp_clause_mask type instead of unsigned
	for mask, use OMP_CLAUSE_MASK_1 instead of 1 for masks.
cp/
	* cp-tree.h (OMP_FOR_GIMPLIFYING_P): Use OMP_LOOP_CHECK instead of
	OMP_FOR_CHECK.
	(finish_omp_for): Add enum tree_code second argument.
	(finish_omp_cancel, finish_omp_cancellation_point): New prototypes.
	* cp-gimplify.c (cp_gimplify_expr, cp_genericize_r): Handle
	OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE.
	* semantics.c (finish_omp_clauses): Handle new OpenMP 4.0 clauses.
	(finish_omp_for): Add code argument, pass it down to make_node
	or c_finish_omp_for.
	(finish_omp_cancel, finish_omp_cancellation_point): New functions.
	* parser.c (cp_parser_omp_clause_name): Add parsing of new
	OpenMP 4.0 clauses.
	(cp_parser_omp_var_list_no_open): Add COLON argument, if non-NULL,
	accept termination by colon instead of closing paren.
	(cp_parser_omp_var_list, cp_parser_omp_clause_reduction): Adjust
	callers.
	(cp_parser_omp_clause_branch, cp_parser_omp_clause_cancelkind,
	cp_parser_omp_clause_num_teams, cp_parser_omp_clause_aligned,
	cp_parser_omp_clause_linear, cp_parser_omp_clause_depend,
	cp_parser_omp_clause_map, cp_parser_omp_clause_device,
	cp_parser_omp_clause_dist_schedule, cp_parser_omp_clause_proc_bind):
	New functions.
	(cp_parser_omp_all_clauses): Change mask argument's type to
	omp_clause_mask from unsigned.  Fix c_name for
	PRAGMA_OMP_CLAUSE_UNTIED.  Handle new OpenMP 4.0 clauses.
	(cp_parser_omp_for_loop): Add code argument.  Pass it down to
	finish_omp_for.
	(OMP_SIMD_CLAUSE_MASK): Define.
	(cp_parser_omp_simd): New function.
	(OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK,
	OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK,
	OMP_TASK_CLAUSE_MASK): Use OMP_CLAUSE_MASK_1 instead of 1.
	(cp_parser_omp_for): Handle parsing of #pragma omp for simd.
	(cp_parser_omp_parallel): Handle parsing of
	#pragma omp parallel for simd.  Use omp_clause_mask type
	instead of unsigned for mask, use OMP_CLAUSE_MASK_1 instead
	of 1 for masks.
	(OMP_CANCEL_CLAUSE_MASK, OMP_CANCELLATION_POINT_CLAUSE_MASK): Define.
	(cp_parser_omp_cancel, cp_parser_omp_cancellation_point): New
	functions.
	(cp_parser_omp_construct): Handle PRAGMA_OMP_SIMD, PRAGMA_OMP_CANCEL
	and PRAGMA_OMP_CANCELLATION_POINT.
	(cp_parser_pragma): Handle PRAGMA_OMP_SIMD.
	* pt.c (tsubst_expr): Handle OMP_SIMD, OMP_FOR_SIMD and
	OMP_DISTRIBUTE.  Pass down TREE_CODE to finish_omp_for.
fortran/
	* f95-lang.c (ATTR_NULL): Define.
c-family/
	* c-omp.c (c_finish_omp_for): Add code argument, pass it down to
	make_code.
	(c_split_parallel_clauses): Handle OMP_CLAUSE_SAFELEN,
	OMP_CLAUSE_ALIGNED and OMP_CLAUSE_LINEAR.
	* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_CANCEL,
	PRAGMA_OMP_CANCELLATION_POINT, PRAGMA_OMP_DECLARE_REDUCTION,
	PRAGMA_OMP_DECLARE_SIMD, PRAGMA_OMP_DECLARE_TARGET,
	PRAGMA_OMP_DISTRIBUTE, PRAGMA_OMP_END_DECLARE_TARGET,
	PRAGMA_OMP_FOR_SIMD, PRAGMA_OMP_PARALLEL_FOR_SIMD, PRAGMA_OMP_SIMD,
	PRAGMA_OMP_TARGET, PRAGMA_OMP_TARGET_DATA, PRAGMA_OMP_TARGET_UPDATE,
	PRAGMA_OMP_TASKGROUP and PRAGMA_OMP_TEAMS.
	(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_ALIGNED,
	PRAGMA_OMP_CLAUSE_DEPEND, PRAGMA_OMP_CLAUSE_DEVICE,
	PRAGMA_OMP_CLAUSE_DIST_SCHEDULE, PRAGMA_OMP_CLAUSE_FOR,
	PRAGMA_OMP_CLAUSE_FROM, PRAGMA_OMP_CLAUSE_INBRANCH,
	PRAGMA_OMP_CLAUSE_LINEAR, PRAGMA_OMP_CLAUSE_MAP,
	PRAGMA_OMP_CLAUSE_NOTINBRANCH, PRAGMA_OMP_CLAUSE_NUM_TEAMS,
	PRAGMA_OMP_CLAUSE_PARALLEL, PRAGMA_OMP_CLAUSE_PROC_BIND,
	PRAGMA_OMP_CLAUSE_SAFELEN, PRAGMA_OMP_CLAUSE_SECTIONS,
	PRAGMA_OMP_CLAUSE_SIMDLEN, PRAGMA_OMP_CLAUSE_TASKGROUP,
	PRAGMA_OMP_CLAUSE_TO and PRAGMA_OMP_CLAUSE_UNIFORM.
	* c-pragma.c (omp_pragmas): Add new OpenMP 4.0 constructs.
	* c-common.h (c_finish_omp_for): Add enum tree_code as second
	argument.
	(OMP_CLAUSE_MASK_1): Define.
	(omp_clause_mask): For HWI >= 64 new typedef for
	unsigned HOST_WIDE_INT, otherwise a class with needed ctors and
	operators.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197161 138bc75d-0d04-0410-961f-82ee72b054a4
	OMP_SIMD and OMP_FOR_SIMD loops.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197515 138bc75d-0d04-0410-961f-82ee72b054a4
	omp_get_proc_bind, omp_get_proc_bind_, omp_set_default_device,
	omp_set_default_device_, omp_set_default_device_8_,
	omp_get_default_device, omp_get_default_device_,
	omp_get_num_devices, omp_get_num_devices_, omp_get_num_teams,
	omp_get_num_teams_, omp_get_team_num, omp_get_team_num_): Export
	@@OMP_4.0.
	(GOMP_cancel, GOMP_cancellation_point, GOMP_parallel_loop_dynamic,
	GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime,
	GOMP_parallel_loop_static, GOMP_parallel_sections, GOMP_parallel,
	GOMP_taskgroup_start, GOMP_taskgroup_end): Export @@GOMP_4.0.
	* parallel.c (GOMP_parallel_end): Add ialias.
	(GOMP_parallel, GOMP_cancel, GOMP_cancellation_point): New
	functions.
	* omp.h.in (omp_proc_bind_t): New typedef.
	(omp_get_cancellation, omp_get_proc_bind, omp_set_default_device,
	omp_get_default_device, omp_get_num_devices, omp_get_num_teams,
	omp_get_team_num): New prototypes.
	* env.c (omp_get_cancellation, omp_get_proc_bind,
	omp_set_default_device, omp_get_default_device, omp_get_num_devices,
	omp_get_num_teams, omp_get_team_num): New functions.
	* fortran.c (ULP, STR1, STR2, ialias_redirect): Removed.
	(omp_get_cancellation_, omp_get_proc_bind_, omp_set_default_device_,
	omp_set_default_device_8_, omp_get_default_device_,
	omp_get_num_devices_, omp_get_num_teams_, omp_get_team_num_): New
	functions.
	* libgomp.h (ialias_ulp, ialias_str1, ialias_str2, ialias_redirect,
	ialias_call): Define.
	* libgomp_g.h (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic,
	GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime, GOMP_parallel,
	GOMP_cancel, GOMP_cancellation_point, GOMP_taskgroup_start,
	GOMP_taskgroup_end, GOMP_parallel_sections): New prototypes.
	* task.c (GOMP_taskgroup_start, GOMP_taskgroup_end): New functions.
	* sections.c (GOMP_parallel_sections): New function.
	* loop.c (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic,
	GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime): New
	functions.
	(GOMP_parallel_end): Add ialias_redirect.
	* omp_lib.f90.in (omp_proc_bind_kind, omp_proc_bind_false,
	omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close,
	omp_proc_bind_spread): New params.
	(omp_get_cancellation, omp_get_proc_bind, omp_set_default_device,
	omp_get_default_device, omp_get_num_devices, omp_get_num_teams,
	omp_get_team_num): New interfaces.
	* omp_lib.h.in (omp_proc_bind_kind, omp_proc_bind_false,
	omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close,
	omp_proc_bind_spread): New params.
	(omp_get_cancellation, omp_get_proc_bind, omp_set_default_device,
	omp_get_default_device, omp_get_num_devices, omp_get_num_teams,
	omp_get_team_num): New externals.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197670 138bc75d-0d04-0410-961f-82ee72b054a4
	(BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove.
	(BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New.
	* gimplify.c (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses):
	Handle OMP_CLAUSE_PROC_BIND.
	* omp-builtins.def (BUILT_IN_GOMP_TASKGROUP_START,
	BUILT_IN_GOMP_TASKGROUP_END, BUILT_IN_GOMP_PARALLEL_LOOP_STATIC,
	BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC,
	BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED,
	BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME, BUILT_IN_GOMP_PARALLEL,
	BUILT_IN_GOMP_PARALLEL_SECTIONS): New built-ins.
	(BUILT_IN_GOMP_PARALLEL_LOOP_STATIC_START,
	BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC_START,
	BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED_START,
	BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME_START,
	BUILT_IN_GOMP_PARALLEL_START, BUILT_IN_GOMP_PARALLEL_END,
	BUILT_IN_GOMP_PARALLEL_SECTIONS_START): Remove.
	* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_PROC_BIND.
	(expand_parallel_call): Expand #pragma omp parallel* as
	calls to the new GOMP_parallel_* APIs without _start at the end,
	instead of GOMP_parallel_*_start followed by fn.omp_fn.N call,
	followed by GOMP_parallel_end.  Handle OMP_CLAUSE_PROC_BIND.
	* tree-ssa-alias.c (ref_maybe_used_by_call_p_1,
	call_may_clobber_ref_p_1): Handle BUILT_IN_GOMP_TASKGROUP_END
	instead of BUILT_IN_GOMP_PARALLEL_END.
c-family/
	* c-common.c (DEF_FUNCTION_TYPE_8): Define.
	* c-omp.c (c_split_parallel_clauses): Handle OMP_CLAUSE_PROC_BIND.
cp/
	* cp-tree.h (finish_omp_taskgroup): New prototype.
	* parser.c (cp_parser_omp_clause_proc_bind): Require ) instead of
	colon at the end of the clause.
	(cp_parser_omp_taskgroup): New function.
	(cp_parser_omp_construct, cp_parser_pragma): Handle
	PRAGMA_OMP_TASKGROUP.
	* semantics.c (finish_omp_taskgroup): New function.
fortran/
	* f95-lang.c (DEF_FUNCTION_TYPE_8): Define.
	* types.def (DEF_FUNCTION_TYPE_8): Document.
	(BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove.
	(BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
	BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New.
ada/
	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_8): Define.
lto/
	* lto-lang.c (DEF_FUNCTION_TYPE_8): Define.
testsuite/
	* gcc.dg/gomp/combined-1.c: Look for GOMP_parallel_loop_runtime
	instead of GOMP_parallel_loop_runtime_start.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197676 138bc75d-0d04-0410-961f-82ee72b054a4
	OMP_CLAUSE_LINEAR_NO_COPYOUT): Define.
	* omp-low.c (extract_omp_for_data): Handle #pragma omp simd.
	(build_outer_var_ref): For #pragma omp simd allow linear etc.
	clauses to bind even to private vars.
	(scan_sharing_clauses): Handle OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED
	and OMP_CLAUSE_SAFELEN.
	(lower_rec_input_clauses): Handle OMP_CLAUSE_LINEAR.  Don't emit
	a GOMP_barrier call for firstprivate/lastprivate in #pragma omp simd.
	(lower_lastprivate_clauses): Handle also OMP_CLAUSE_LINEAR.
	(expand_omp_simd): New function.
	(expand_omp_for): Handle #pragma omp simd.
	* gimplify.c (enum gimplify_omp_var_data): Add GOVD_LINEAR and
	GOVD_ALIGNED, add GOVD_LINEAR into GOVD_DATA_SHARE_CLASS.
	(enum omp_region_type): Add ORT_SIMD.
	(gimple_add_tmp_var, gimplify_var_or_parm_decl, omp_check_private,
	omp_firstprivatize_variable, omp_notice_variable): Handle ORT_SIMD
	like ORT_WORKSHARE.
	(omp_is_private): Likewise.  Add SIMD argument, tweak diagnostics
	and add extra errors in simd constructs.
	(gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses): Handle
	OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_SAFELEN.
	(gimplify_adjust_omp_clauses_1): Handle GOVD_LASTPRIVATE and
	GOVD_ALIGNED.
	(gimplify_omp_for): Handle #pragma omp simd.
cp/
	* cp-tree.h (CP_OMP_CLAUSE_INFO): Also allow it on OMP_CLAUSE_LINEAR.
	* parser.c (cp_parser_omp_var_list_no_open): If colon is non-NULL,
	temporarily disable colon_corrects_to_scope_p during the parsing
	of the variable list.
	(cp_parser_omp_clause_safelen, cp_parser_omp_clause_simdlen): New
	functions.
	(cp_parser_omp_all_clauses): Handle OMP_CLAUSE_SAFELEN and
	OMP_CLAUSE_SIMDLEN.
	* semantics.c (finish_omp_clauses): Allow NULL_TREE in
	OMP_CLAUSE_ALIGNED_ALIGNMENT.
testsuite/
	* c-c++-common/gomp/simd1.c: New test.
	* c-c++-common/gomp/simd2.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198092 138bc75d-0d04-0410-961f-82ee72b054a4
	* gimplify.c (gimplify_adjust_omp_clauses): For linear clauses
	if outer_context is non-NULL, but not ORT_COMBINED_PARALLEL,
	call omp_notice_variable.  Remove aligned clauses that can't
	be handled yet.
	* omp-low.c: Include target.h.
	(scan_sharing_clauses): For aligned clauses with global arrays
	register local replacement.
	(omp_clause_aligned_alignment): New function.
	(lower_rec_input_clauses): For aligned clauses for global
	arrays or automatic pointers emit __builtin_assume_aligned
	before the loop if possible.
	(expand_omp_regimplify_p, expand_omp_build_assign): New functions.
	(expand_omp_simd): Use them.  Handle pointer iterators and broken
	loops.
	(lower_omp_for): Call lower_omp on gimple_omp_body_ptr after
	calling lower_rec_input_clauses, not before it.
cp/
	* semantics.c (finish_omp_clauses): On OMP_CLAUSE_LINEAR clauses
	verify OMP_CLAUSE_DECL has integral or pointer type, and handle
	linear steps for pointer type decls.  FIx up handling of
	OMP_CLAUSE_UNIFORM.
testsuite/
	* c-c++-common/gomp/simd3.c: New test.
	* c-c++-common/gomp/simd4.c: New test.
	* c-c++-common/gomp/simd5.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198193 138bc75d-0d04-0410-961f-82ee72b054a4
c/
	* c-parser.c (c_parser_compound_statement,
	c_parser_statement): Adjust comments for OpenMP 3.0+
	additions.
	(c_parser_pragma): Handle PRAGMA_OMP_CANCEL and
	PRAGMA_OMP_CANCELLATION_POINT.
	(c_parser_omp_clause_name): Handle new OpenMP 4.0 clauses.
	(c_parser_omp_clause_collapse): Fully fold collapse
	expression.
	(c_parser_omp_clause_branch, c_parser_omp_clause_cancelkind,
	c_parser_omp_clause_num_teams, c_parser_omp_clause_aligned,
	c_parser_omp_clause_linear, c_parser_omp_clause_safelen,
	c_parser_omp_clause_simdlen, c_parser_omp_clause_depend,
	c_parser_omp_clause_map, c_parser_omp_clause_device,
	c_parser_omp_clause_dist_schedule, c_parser_omp_clause_proc_bind,
	c_parser_omp_clause_to, c_parser_omp_clause_from,
	c_parser_omp_clause_uniform): New functions.
	(c_parser_omp_all_clauses): Handle new OpenMP 4.0 clauses.
	(c_parser_omp_for_loop): Add CODE argument, pass it through
	to c_finish_omp_for.
	(OMP_SIMD_CLAUSE_MASK): Define.
	(c_parser_omp_simd): New function.
	(c_parser_omp_for): Parse #pragma omp for simd.
	(OMP_PARALLEL_CLAUSE_MASK): Add OMP_CLAUSE_PROC_BIND.
	(c_parser_omp_parallel): Parse #pragma omp parallel for simd.
	(OMP_TASK_CLAUSE_MASK): Add OMP_CLAUSE_DEPEND.
	(c_parser_omp_taskgroup): New function.
	(OMP_CANCEL_CLAUSE_MASK, OMP_CANCELLATION_POINT_CLAUSE_MASK): Define.
	(c_parser_omp_cancel, c_parser_omp_cancellation_point): New functions.
	(c_parser_omp_construct): Handle PRAGMA_OMP_SIMD and
	PRAGMA_OMP_TASKGROUP.
	(c_parser_transaction_cancel): Formatting fix.
	* c-tree.h (c_begin_omp_taskgroup, c_finish_omp_taskgroup,
	c_finish_omp_cancel, c_finish_omp_cancellation_point): New prototypes.
	* c-typeck.c (c_begin_omp_taskgroup, c_finish_omp_taskgroup,
	c_finish_omp_cancel, c_finish_omp_cancellation_point): New functions.
	(c_finish_omp_clauses): Handle new OpenMP 4.0 clauses.
cp/
	* parser.c (cp_parser_omp_clause_name): Add missing break after
	case 'i'.
	(cp_parser_omp_cancellation_point): Diagnose error if
	#pragma omp cancellation isn't followed by point.
	* semantics.c (finish_omp_clauses): Complain also about zero
	in alignment of aligned directive or safelen/simdlen expressions.
	(finish_omp_cancel): Fix up diagnostics wording.
testsuite/
	* c-c++-common/gomp/simd1.c: Enable also for C.
	* c-c++-common/gomp/simd2.c: Likewise.
	* c-c++-common/gomp/simd3.c: Likewise.
	* c-c++-common/gomp/simd4.c: Likewise.  Adjust expected
	diagnostics for C.
	* c-c++-common/gomp/simd5.c: Enable also for C.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198264 138bc75d-0d04-0410-961f-82ee72b054a4
	OpenMP constructs nested inside simd region.  Don't treat
	#pragma omp simd as work-sharing region.  Disallow work-sharing
	constructs inside of critical region.  Complain if ordered
	region is nested inside of parallel region without loop
	region in between.
	(scan_omp_1_stmt): Call check_omp_nesting_restrictions even
	for GOMP_{cancel{,lation_point},taskyield,taskwait} calls.

	* gfortran.dg/gomp/appendix-a/a.35.5.f90: Add dg-error.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198459 138bc75d-0d04-0410-961f-82ee72b054a4
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198460 138bc75d-0d04-0410-961f-82ee72b054a4
	dump_gimple_omp_atomic_store): Handle gimple_omp_atomic_seq_cst_p.
	* gimple.h (enum gf_mask): Add GF_OMP_ATOMIC_SEQ_CST.
	(gimple_omp_atomic_set_seq_cst, gimple_omp_atomic_seq_cst_p): New
	inline functions.
	* omp-low.c (expand_omp_atomic_load, expand_omp_atomic_store,
	expand_omp_atomic_fetch_op): If gimple_omp_atomic_seq_cst_p,
	pass MEMMODEL_SEQ_CST instead of MEMMODEL_RELAXED to the builtin.
	* gimplify.c (gimplify_omp_atomic): Handle OMP_ATOMIC_SEQ_CST.
	* tree-pretty-print.c (dump_generic_node): Handle OMP_ATOMIC_SEQ_CST.
	* tree.def (OMP_ATOMIC): Add comment that OMP_ATOMIC* must stay
	consecutive.
	* tree.h (OMP_ATOMIC_SEQ_CST): Define.
c/
	* c-parser.c (c_parser_omp_atomic): Parse seq_cst clause, pass
	true if it is present to c_finish_omp_atomic.
cp/
	* pt.c (tsubst_expr): Pass OMP_ATOMIC_SEQ_CST to finish_omp_atomic.
	* semantics.c (finish_omp_atomic): Add seq_cst argument, pass
	it through to c_finish_omp_atomic or store into OMP_ATOMIC_SEQ_CST.
	* cp-tree.h (finish_omp_atomic): Adjust prototype.
	* parser.c (cp_parser_omp_atomic): Parse seq_cst clause, pass
	true if it is present to finish_omp_atomic.
c-family/
	* c-omp.c (c_finish_omp_atomic): Add seq_cst argument, store it
	into OMP_ATOMIC_SEQ_CST bit.
	* c-common.h (c_finish_omp_atomic): Adjust prototype.
testsuite/
	* testsuite/libgomp.c/atomic-17.c: New test.
	* testsuite/libgomp.c++/atomic-14.C: New test.
	* testsuite/libgomp.c++/atomic-15.C: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198461 138bc75d-0d04-0410-961f-82ee72b054a4
Remove deprecated vectorlength clause features.

Remove deprecated assert and noassert clauses.

Implement vectorlength clause in OpenMP safelen terms.
	(attribute_value_equal): Call it for -fopenmp if
	TREE_VALUE of the attributes are both OMP_CLAUSEs.
	* tree.h (omp_declare_simd_clauses_equal): Declare.
c-family/
	* c-common.c (c_common_attribute_table): Add "omp declare simd"
	attribute.
	(handle_omp_declare_simd_attribute): New function.
	* c-common.h (c_omp_declare_simd_clauses_to_numbers,
	c_omp_declare_simd_clauses_to_decls): Declare.
	* c-omp.c (c_omp_declare_simd_clause_cmp,
	c_omp_declare_simd_clauses_to_numbers,
	c_omp_declare_simd_clauses_to_decls): New functions.
cp/
	* cp-tree.h (cp_decl_specifier_seq): Add omp_declare_simd_clauses
	field.
	(finish_omp_declare_simd): Declare.
	* decl2.c (is_late_template_attribute): Return true for
	"omp declare simd" attribute.
	(cp_check_const_attributes): Don't check TREE_VALUE of arg if
	arg isn't a TREE_LIST.
	* decl.c (grokfndecl): Add omp_declare_simd_clauses argument, call
	finish_omp_declare_simd if non-NULL.
	(grokdeclarator): Pass it declspecs->omp_declare_simd_clauses
	to grokfndecl.
	* pt.c (apply_late_template_attributes): Handle "omp declare simd"
	attribute specially.
	(tsubst_omp_clauses): Add declare_simd argument, don't call
	finish_omp_clauses if it is set.  Handle OpenMP 4.0 clauses.
	(tsubst_expr): Adjust tsubst_omp_clauses callers.
	* semantics.c (finish_omp_clauses): Diagnose inbranch notinbranch.
	(finish_omp_declare_simd): New function.
	* parser.h (struct cp_parser): Add omp_declare_simd_clauses field.
	* parser.c (cp_ensure_no_omp_declare_simd,
	cp_finish_omp_declare_simd): New functions.
	(enum pragma_context): Add pragma_member and pragma_objc_icode.
	(cp_parser_linkage_specification, cp_parser_namespace_definition,
	cp_parser_class_specifier_1): Call cp_ensure_no_omp_declare_simd.
	(cp_parser_init_declarator, cp_parser_member_declaration,
	cp_parser_function_definition_from_specifiers_and_declarator,
	cp_parser_save_member_function_body): Copy
	parser->omp_declare_simd_clauses to
	decl_specifiers->omp_declare_simd_clauses, call
	cp_finish_omp_declare_simd.
	(cp_parser_member_specification_opt): Pass pragma_member instead
	of pragma_external to cp_parser_pragma.
	(cp_parser_objc_interstitial_code): Pass pragma_objc_icode instead
	of pragma_external to cp_parser_pragma.
	(cp_parser_omp_var_list_no_open): If parser->omp_declare_simd_clauses,
	just cp_parser_identifier the argument names.
	(cp_parser_omp_all_clauses): Don't call finish_omp_clauses for
	parser->omp_declare_simd_clauses.
	(OMP_DECLARE_SIMD_CLAUSE_MASK): Define.
	(cp_parser_omp_declare_simd, cp_parser_omp_declare): New functions.
	(cp_parser_pragma): Call cp_ensure_no_omp_declare_simd.  Handle
	PRAGMA_OMP_DECLARE_REDUCTION.  Replace == pragma_external with
	!= pragma_stmt and != pragma_compound.
testsuite/
	* g++.dg/gomp/declare-simd-1.C: New test.
	* g++.dg/gomp/declare-simd-2.C: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198739 138bc75d-0d04-0410-961f-82ee72b054a4
	* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_LINEAR_STEP
	adjustments for pointer-types here.  Diagnose inbranch notinbranch
	being used together.
	(c_finish_omp_declare_simd): New function.
	* c-parser.c (enum pragma_context): Add pragma_struct and
	pragma_param.
	(c_parser_declaration_or_fndef): Add omp_declare_simd_clauses
	argument.  Call c_finish_omp_declare_simd if needed.
	(c_parser_external_declaration, c_parser_compound_statement_nostart,
	c_parser_label, c_parser_for_statement, c_parser_objc_methodprotolist,
	c_parser_omp_for_loop): Adjust c_parser_declaration_or_fndef callers.
	(c_parser_struct_or_union_specifier): Use pragma_struct instead of
	pragma_external.
	(c_parser_parameter_declaration): Use pragma_param instead of
	pragma_external.
	(c_parser_pragma): Handle PRAGMA_OMP_DECLARE_REDUCTION.
	Replace == pragma_external with != pragma_stmt && != pragma_compound
	test.
	(c_parser_omp_variable_list): Add declare_simd argument.  Don't lookup
	vars if it is true, just store identifiers.
	(c_parser_omp_var_list_parens, c_parser_omp_clause_depend,
	c_parser_omp_clause_map): Adjust callers.
	(c_parser_omp_clause_reduction, c_parser_omp_clause_aligned): Add
	declare_simd argument, pass it through to c_parser_omp_variable_list.
	(c_parser_omp_clause_linear): Likewise.  Don't handle
	OMP_CLAUSE_LINEAR_STEP adjustements for pointer-types here.
	(c_parser_omp_clause_uniform): Call c_parser_omp_variable_list
	instead of c_parser_omp_var_list_parens to pass true as declare_simd.
	(c_parser_omp_all_clauses): Add declare_simd argument, pass it through
	clause parsing routines as needed.  Don't call c_finish_omp_clauses if
	set.
	(c_parser_omp_simd, c_parser_omp_for, c_parser_omp_sections,
	c_parser_omp_parallel, c_parser_omp_single, c_parser_omp_task,
	c_parser_omp_cancel, c_parser_omp_cancellation_point): Adjust callers.
	(OMP_DECLARE_SIMD_CLAUSE_MASK): Define.
	(c_parser_omp_declare_simd, c_parser_omp_declare): New functions.

	* gcc.dg/gomp/declare-simd-1.c: New test.
	* gcc.dg/gomp/declare-simd-2.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198828 138bc75d-0d04-0410-961f-82ee72b054a4
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198835 138bc75d-0d04-0410-961f-82ee72b054a4
kraj pushed a commit to kraj/gcc that referenced this pull request May 10, 2022
…04617]

On
 #define A(n) int foo1##n(void) { return 1##n; }
 #define B(n) A(n##0) A(n##1) A(n#gcc-mirror#2) A(n#gcc-mirror#3) A(n#gcc-mirror#4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9)
 #define C(n) B(n##0) B(n##1) B(n#gcc-mirror#2) B(n#gcc-mirror#3) B(n#gcc-mirror#4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9)
 #define D(n) C(n##0) C(n##1) C(n#gcc-mirror#2) C(n#gcc-mirror#3) C(n#gcc-mirror#4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9)
 #define E(n) D(n##0) D(n##1) D(n#gcc-mirror#2) D(n#gcc-mirror#3) D(n#gcc-mirror#4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9)
 E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
 B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR lto/104617
	* simple-object-elf.c (simple_object_elf_match): Fix up URL
	in comment.
	(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
	sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
	range (inclusive).

(cherry picked from commit 2f59f06)
kraj pushed a commit to kraj/gcc that referenced this pull request May 11, 2022
…04617]

On
 #define A(n) int foo1##n(void) { return 1##n; }
 #define B(n) A(n##0) A(n##1) A(n#gcc-mirror#2) A(n#gcc-mirror#3) A(n#gcc-mirror#4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9)
 #define C(n) B(n##0) B(n##1) B(n#gcc-mirror#2) B(n#gcc-mirror#3) B(n#gcc-mirror#4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9)
 #define D(n) C(n##0) C(n##1) C(n#gcc-mirror#2) C(n#gcc-mirror#3) C(n#gcc-mirror#4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9)
 #define E(n) D(n##0) D(n##1) D(n#gcc-mirror#2) D(n#gcc-mirror#3) D(n#gcc-mirror#4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9)
 E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
 B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR lto/104617
	* simple-object-elf.c (simple_object_elf_match): Fix up URL
	in comment.
	(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
	sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
	range (inclusive).

(cherry picked from commit 2f59f06)
xionghul pushed a commit to xionghul/gcc that referenced this pull request Dec 23, 2022
With many thanks to H.J. for doing all the hard work, this patch resolves
two P1 regressions; PR target/106933 and PR target/106959.

Although superficially similar, the i386 backend's two scalar-to-vector
(STV) passes perform their transformations in importantly different ways.
The original pass converting SImode and DImode operations to V4SImode
or V2DImode operations is "soft", allowing values to be maintained in
both integer and vector hard registers.  The newer pass converting TImode
operations to V1TImode is "hard" (all or nothing) that converts all uses
of a pseudo to vector form.  To implement this it invokes powerful ju-ju
calling SET_MODE on a reg_rtx, which due to RTL sharing, often updates
this pseudo's mode everywhere in the RTL chain.  Hence, TImode STV can only
be performed when all uses of a pseudo are convertible to V1TImode form.
To ensure this the STV passes currently use data-flow analysis to inspect
all DEFs and USEs in a chain.  This works fine for chains that are in
the usual single assignment form, but the occurrence of uninitialized
variables, or multiple assignments that split a pseudo's usage into
several independent chains (lifetimes) can lead to situations where
some but not all of a pseudo's occurrences need to be updated.  This is
safe for the SImode/DImode pass, but leads to the above bugs during
the TImode pass.

My one minor tweak to HJ's patch from comment gcc-mirror#4 of bugzilla PR106959
is to only perform the new single_def_chain_p check for TImode STV; it
turns out that STV of SImode/DImode min/max operates safely on multiple-def
chains, and prohibiting this leads to testsuite regressions.  We don't
(yet) support V1TImode min/max, so this idiom isn't an issue during the
TImode STV pass.

For the record, the two alternate possible fixes are (i) make the TImode
STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx
with a new pseudo, or (ii) merging "chains" so that multiple DFA
chains/lifetimes are considered a single STV chain.

2022-12-23  H.J. Lu  <hjl.tools@gmail.com>
	    Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR target/106933
	PR target/106959
	* config/i386/i386-features.cc (single_def_chain_p): New predicate
	function to check that a pseudo's use-def chain is in SSA form.
	(timode_scalar_to_vector_candidate_p): Check that TImode regs that
	are SET_DEST or SET_SRC of an insn match/are single_def_chain_p.

gcc/testsuite/ChangeLog
	PR target/106933
	PR target/106959
	* gcc.target/i386/pr106933-1.c: New test case.
	* gcc.target/i386/pr106933-2.c: Likewise.
	* gcc.target/i386/pr106959-1.c: Likewise.
	* gcc.target/i386/pr106959-2.c: Likewise.
	* gcc.target/i386/pr106959-3.c: Likewise.
xionghul pushed a commit to xionghul/gcc that referenced this pull request Jan 28, 2023
The aarch64 ISA specification allows a left shift amount to be applied
after extension in the range of 0 to 4 (encoded in the imm3 field).

This is true for at least the following instructions:

 * ADD (extend register)
 * ADDS (extended register)
 * SUB (extended register)

The result of this patch can be seen, when compiling the following code:

uint64_t myadd(uint64_t a, uint64_t b)
{
    return a+(((uint8_t)b)<<4);
}

Without the patch the following sequence will be generated:

0000000000000000 <myadd>:
   0:	d37c1c21 	ubfiz	x1, x1, gcc-mirror#4, gcc-mirror#8
   4:	8b000020 	add	x0, x1, x0
   8:	d65f03c0 	ret

With the patch the ubfiz will be merged into the add instruction:

0000000000000000 <myadd>:
   0:	8b211000 	add	x0, x0, w1, uxtb gcc-mirror#4
   4:	d65f03c0 	ret

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_uxt_size): fix an
	off-by-one in checking the permissible shift-amount.
vathpela pushed a commit to vathpela/gcc that referenced this pull request Apr 29, 2023
This patch adds support for xstormy16's swap nibbles instruction (swpn).
For the test case:

short foo(short x) {
  return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f);
}

GCC with -O2 currently generates the nine instruction sequence:
foo:    mov r7,r2
        asr r2,gcc-mirror#4
        and r2,gcc-mirror#15
        mov.w r6,#-256
        and r6,r7
        or r2,r6
        shl r7,gcc-mirror#4
        and r7,#255
        or r2,r7
        ret

with this patch, we now generate:
foo:	swpn r2
	ret

To achieve this using combine's four instruction "combinations" requires
a little wizardry.  Firstly, define_insn_and_split are introduced to
treat logical shifts followed by bitwise-AND as macro instructions that
are split after reload.  This is sufficient to recognize a QImode
nibble swap, which can be implemented by swpn followed by either a
zero-extension or a sign-extension from QImode to HImode.  Then finally,
in the correct context, a QImode swap-nibbles pattern can be combined to
preserve the high-byte of a HImode word, matching the xstormy16's swpn
semantics.  The naming of the new code iterators is taken from i386.md.

2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/stormy16/stormy16.md (any_lshift): New code iterator.
	(any_or_plus): Likewise.
	(any_rotate): Likewise.
	(*<any_lshift>_and_internal): New define_insn_and_split to
	recognize a logical shift followed by an AND, and split it
	again after reload.
	(*swpn): New define_insn matching xstormy16's swpn.
	(*swpn_zext): New define_insn recognizing swpn followed by
	zero_extendqihi2, i.e. with the high byte set to zero.
	(*swpn_sext): Likewise, for swpn followed by cbw.
	(*swpn_sext_2): Likewise, for an alternate RTL form.
	(*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
	sequence is split in the correct place to recognize the *swpn_zext
	followed by any_or_plus (ior, xor or plus) instruction.

gcc/testsuite/ChangeLog
	* gcc.target/xstormy16/swpn-1.c: New QImode test case.
	* gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
	* gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
	* gcc.target/xstormy16/swpn-4.c: New HImode test case.
kraj pushed a commit to kraj/gcc that referenced this pull request May 2, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X#gcc-mirror#2;		\
    template <class U> struct X#gcc-mirror#3;		\
    template <class U> struct X#gcc-mirror#4;		\
    template <class U> struct X#gcc-mirror#5;		\
    template <class U> struct X#gcc-mirror#6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
rurban pushed a commit to rurban/gcc that referenced this pull request Oct 26, 2023
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
XYenChi referenced this pull request in XYenChi/gcc Nov 7, 2023
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A #4, R12 { RRAM.A #4, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
hubot pushed a commit that referenced this pull request Feb 16, 2024
Here we have

  template<class T>
  auto is_throwable(T t) -> decltype(throw t, true) { ... }

where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused
the wrong overload to have been chosen.  Jason figured out it's because
we don't correctly implement [expr.prim.id.unqual]#4.2, which post-P2266
says that an id-expression is move-eligible if

"the id-expression (possibly parenthesized) is the operand of
a throw-expression, and names an implicitly movable entity that belongs
to a scope that does not contain the compound-statement of the innermost
lambda-expression, try-block, or function-try-block (if any) whose
compound-statement or ctor-initializer contains the throw-expression."

I worked out that it's trying to say that given

  struct X {
    X();
    X(const X&);
    X(X&&) = delete;
  };

the following should fail: the scope of the throw is an sk_try, and it's
also x's scope S, and S "does not contain the compound-statement of the
*try-block" so x is move-eligible, so we move, so we fail.

  void f ()
  try {
    X x;
    throw x;  // use of deleted function
  } catch (...) {
  }

Whereas here:

  void g (X x)
  try {
    throw x;
  } catch (...) {
  }

the throw is again in an sk_try, but x's scope is an sk_function_parms
which *does* contain the {} of the *try-block, so x is not move-eligible,
so we don't move, so we use X(const X&), and the code is fine.

The current code also doesn't seem to handle

  void h (X x) {
    void z (decltype(throw x, true));
  }

where there's no enclosing lambda or sk_try so we should move.

I'm not doing anything about lambdas because we shouldn't reach the
code at the end of the function: the DECL_HAS_VALUE_EXPR_P check
shouldn't let us go further.

	PR c++/113789
	PR c++/113853

gcc/cp/ChangeLog:

	* typeck.cc (treat_lvalue_as_rvalue_p): Update code to better
	reflect [expr.prim.id.unqual]#4.2.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/sfinae69.C: Remove dg-bogus.
	* g++.dg/cpp0x/sfinae70.C: New test.
	* g++.dg/cpp0x/sfinae71.C: New test.
	* g++.dg/cpp0x/sfinae72.C: New test.
	* g++.dg/cpp2a/implicit-move4.C: New test.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 19, 2024
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X#gcc-mirror#1;		\
    template <class U> struct X#gcc-mirror#2;		\
    template <class U> struct X#gcc-mirror#3;		\
    template <class U> struct X#gcc-mirror#4;		\
    template <class U> struct X#gcc-mirror#5;		\
    template <class U> struct X#gcc-mirror#6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 19, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 19, 2024
Here we have

  template<class T>
  auto is_throwable(T t) -> decltype(throw t, true) { ... }

where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused
the wrong overload to have been chosen.  Jason figured out it's because
we don't correctly implement [expr.prim.id.unqual]gcc-mirror#4.2, which post-P2266
says that an id-expression is move-eligible if

"the id-expression (possibly parenthesized) is the operand of
a throw-expression, and names an implicitly movable entity that belongs
to a scope that does not contain the compound-statement of the innermost
lambda-expression, try-block, or function-try-block (if any) whose
compound-statement or ctor-initializer contains the throw-expression."

I worked out that it's trying to say that given

  struct X {
    X();
    X(const X&);
    X(X&&) = delete;
  };

the following should fail: the scope of the throw is an sk_try, and it's
also x's scope S, and S "does not contain the compound-statement of the
*try-block" so x is move-eligible, so we move, so we fail.

  void f ()
  try {
    X x;
    throw x;  // use of deleted function
  } catch (...) {
  }

Whereas here:

  void g (X x)
  try {
    throw x;
  } catch (...) {
  }

the throw is again in an sk_try, but x's scope is an sk_function_parms
which *does* contain the {} of the *try-block, so x is not move-eligible,
so we don't move, so we use X(const X&), and the code is fine.

The current code also doesn't seem to handle

  void h (X x) {
    void z (decltype(throw x, true));
  }

where there's no enclosing lambda or sk_try so we should move.

I'm not doing anything about lambdas because we shouldn't reach the
code at the end of the function: the DECL_HAS_VALUE_EXPR_P check
shouldn't let us go further.

	PR c++/113789
	PR c++/113853

gcc/cp/ChangeLog:

	* typeck.cc (treat_lvalue_as_rvalue_p): Update code to better
	reflect [expr.prim.id.unqual]gcc-mirror#4.2.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/sfinae69.C: Remove dg-bogus.
	* g++.dg/cpp0x/sfinae70.C: New test.
	* g++.dg/cpp0x/sfinae71.C: New test.
	* g++.dg/cpp0x/sfinae72.C: New test.
	* g++.dg/cpp2a/implicit-move4.C: New test.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 21, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 25, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 25, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  gcc-mirror#1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
NinaRanns referenced this pull request in NinaRanns/gcc May 30, 2024
fixing tests and removing C++20 requirement
hubot pushed a commit that referenced this pull request Jun 13, 2024
Here during overload resolution we have two strictly viable ambiguous
candidates #1 and #2, and two non-strictly viable candidates #3 and #4
which we hold on to ever since r14-6522.  These latter candidates have
an empty second arg conversion since the first arg conversion was deemed
bad, and this trips up joust when called on #3 and #4 which assumes all
arg conversions are there.

We can fix this by making joust robust to empty arg conversions, but in
this situation we shouldn't need to compare #3 and #4 at all given that
we have a strictly viable candidate.  To that end, this patch makes
tourney shortcut considering non-strictly viable candidates upon
encountering ambiguity between two strictly viable candidates (taking
advantage of the fact that the candidates list is sorted according to
viability via splice_viable).

	PR c++/115239

gcc/cp/ChangeLog:

	* call.cc (tourney): Don't consider a non-strictly viable
	candidate as the champ if there was ambiguity between two
	strictly viable candidates.

gcc/testsuite/ChangeLog:

	* g++.dg/overload/error7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
hubot pushed a commit that referenced this pull request Jun 17, 2024
Here during overload resolution we have two strictly viable ambiguous
candidates #1 and #2, and two non-strictly viable candidates #3 and #4
which we hold on to ever since r14-6522.  These latter candidates have
an empty second arg conversion since the first arg conversion was deemed
bad, and this trips up joust when called on #3 and #4 which assumes all
arg conversions are there.

We can fix this by making joust robust to empty arg conversions, but in
this situation we shouldn't need to compare #3 and #4 at all given that
we have a strictly viable candidate.  To that end, this patch makes
tourney shortcut considering non-strictly viable candidates upon
encountering ambiguity between two strictly viable candidates (taking
advantage of the fact that the candidates list is sorted according to
viability via splice_viable).

	PR c++/115239

gcc/cp/ChangeLog:

	* call.cc (tourney): Don't consider a non-strictly viable
	candidate as the champ if there was ambiguity between two
	strictly viable candidates.

gcc/testsuite/ChangeLog:

	* g++.dg/overload/error7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 7fed7e9)
hubot pushed a commit that referenced this pull request Jul 19, 2024
These tests used to generate:

        bl      swap
        ldr     r2, [sp, #4]
        mov     r0, r2  @ __fp16

but g:9d20529d94b23275885f380d155fe8671ab5353a means that we can
load directly into r0:

        bl      swap
        ldrh    r0, [sp, #4]    @ __fp16

This patch updates the tests to "defend" this change.

While there, the scans include:

mov\tr1, r[03]}

But if the spill of r2 occurs first, there's no real reason why
r2 couldn't be used as the temporary, instead r3.

The patch tries to update the scans while preserving the spirit
of the originals.

gcc/testsuite/
	* gcc.target/arm/fp16-aapcs-2.c: Expect the return value to be
	loaded directly from the stack.  Test that the swap generates
	two moves out of r0/r1 and two moves in.
	* gcc.target/arm/fp16-aapcs-4.c: Likewise.
hubot pushed a commit that referenced this pull request Sep 7, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.
mikpe added a commit to mikpe/gcc that referenced this pull request Sep 8, 2024
hubot pushed a commit that referenced this pull request Sep 12, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.

(cherry picked from commit bb8dd09)
hubot pushed a commit that referenced this pull request Sep 13, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.

(cherry picked from commit bb8dd09)
hubot pushed a commit that referenced this pull request Oct 9, 2024
Whenever C1 and C2 are integer constants, X is of a wrapping type, and
cmp is a relational operator, the expression X +- C1 cmp C2 can be
simplified in the following cases:

(a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial
comparison in the following way:
   X +- C1 <= C2
   -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1)
   -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions)
   -INF -+ C1 <= X <= +INF (due to (1))
   -INF -+ C1 <= X (eliminate the right hand side since it holds for any X)

(b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following
sequence of transformations:

   X +- C1 >= C2
   +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1)
   +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions)
   +INF -+ C1 >= X >= -INF (due to (1))
   +INF -+ C1 >= X (eliminate the right hand side since it holds for any X)

(c) The > and < cases are negations of (a) and (b), respectively.

This transformation allows to occasionally save add / sub instructions,
for instance the expression

3 + (uint32_t)f() < 2

compiles to

cmn     w0, #4
cset    w0, ls

instead of

add     w0, w0, 3
cmp     w0, 2
cset    w0, ls

on aarch64.

Testcases that go together with this patch have been split into two
separate files, one containing testcases for unsigned variables and the
other for wrapping signed ones (and thus compiled with -fwrapv).
Additionally, one aarch64 test has been adjusted since the patch has
caused the generated code to change from

cmn     w0, #2
csinc   w0, w1, wzr, cc   (x < -2)

to

cmn     w0, #3
csinc   w0, w1, wzr, cs   (x <= -3)

This patch has been bootstrapped and regtested on aarch64, x86_64, and
i386, and additionally regtested on riscv32.

gcc/ChangeLog:

	PR tree-optimization/116024
	* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/pr116024-2.c: New test.
	* gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto.
	* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.
hubot pushed a commit that referenced this pull request Nov 8, 2024
Update test case for armv8.1-m.main that supports conditional
arithmetic.

armv7-m:
        push    {r4, lr}
        ldr     r4, .L6
        ldr     r4, [r4]
        lsls    r4, r4, #29
        it      mi
        addmi   r2, r2, #1
        bl      bar
        movs    r0, #0
        pop     {r4, pc}

armv8.1-m.main:
        push    {r3, r4, r5, lr}
        ldr     r4, .L5
        ldr     r5, [r4]
        tst     r5, #4
        csinc   r2, r2, r2, eq
        bl      bar
        movs    r0, #0
        pop     {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

	* gcc.target/arm/epilog-1.c: Use check-function-bodies.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
hubot pushed a commit that referenced this pull request Nov 8, 2024
Update test case for armv8.1-m.main that supports conditional
arithmetic.

armv7-m:
        push    {r4, lr}
        ldr     r4, .L6
        ldr     r4, [r4]
        lsls    r4, r4, #29
        it      mi
        addmi   r2, r2, #1
        bl      bar
        movs    r0, #0
        pop     {r4, pc}

armv8.1-m.main:
        push    {r3, r4, r5, lr}
        ldr     r4, .L5
        ldr     r5, [r4]
        tst     r5, #4
        csinc   r2, r2, r2, eq
        bl      bar
        movs    r0, #0
        pop     {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

	* gcc.target/arm/epilog-1.c: Use check-function-bodies.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit ec86e87)
hubot pushed a commit that referenced this pull request Nov 26, 2024
In r14.2.0-376-g724446556e5, I accidentally introduced a regression in
the expected assembler as the csinc instruction was not used for
armv8.1-m.main.

The generated assembler for armv8.1-m.main is:
        push    {r3, r4, r5, lr}
        ldr     r4, .L5
        ldr     r5, [r4]
        adds    r4, r2, #1
        tst     r5, #4
        it      ne
        movne   r2, r4
        bl      bar
        movs    r0, #0
        pop     {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

	* gcc.target/arm/epilog-1.c: Corrected armv8.1.m-main asm.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
hubot pushed a commit that referenced this pull request Feb 5, 2025
When generating thumb2 code,
	LDM SP!, {PC}
is a two-byte instruction, whereas
	LDR PC, [SP], #4
is needs 4 bytes.  When optimizing for size, or when there's no obvious
performance benefit prefer the former.

gcc/ChangeLog:

	PR target/118089
	* config/arm/arm.cc (thumb2_expand_return): Use LDM SP!, {PC}
	when optimizing for size, or when there's no performance benefit over
	LDR PC, [SP], #4.
	(arm_expand_epilogue): Likewise.
hubot pushed a commit that referenced this pull request Feb 7, 2025
My earlier change for making the compiler prefer

	POP	{PC}

over

	LDR	PC, [SP], #4

had a slightly unexpected consequence in that we now also call
arm_emit_multi_reg_pop to handle single register pops when the
register is not PC.  This exposed a latent bug in this function where
the dwarf unwinding notes on the single-register POP were not being
set correctly.

gcc/
	PR target/118089
	* config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust
	note to single-register POP instructions.
hubot pushed a commit that referenced this pull request Jun 13, 2025
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.

(cherry picked from commit bb8dd09)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants