Aldyh/cilk in gomp #4

sushantchry · 2016-01-14T05:56:41Z

Pulling for study purpose, no changes expected

See http://openmp.org/wp/2013/03/openmp-40-rc2/ for the standard draft. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196809 138bc75d-0d04-0410-961f-82ee72b054a4

Add another argument to c_finish_omp_atomic. * parser.c (cp_parser_binary_expression): Handle no_toplevel_fold_p even for binary operations other than comparison. (cp_parser_omp_atomic): Handle parsing OpenMP 4.0 atomics. * pt.c (tsubst_expr) <case OMP_ATOMIC>: Handle atomic exchange. * semantics.c (finish_omp_atomic): Use cp_tree_equal to diagnose expression mismatches and to find out if c_finish_omp_atomic should be called with swapped set to true or false. * c-omp.c (c_finish_omp_atomic): Add swapped argument, if true, build the operation first with rhs, lhs arguments and use NOP_EXPR build_modify_expr. * c-common.h (c_finish_omp_atomic): Adjust prototype. * c-c++-common/gomp/atomic-15.c: Remove error test that is now valid in OpenMP 4.0. * testsuite/libgomp.c++/atomic-10.C: New test. * testsuite/libgomp.c++/atomic-11.C: New test. * testsuite/libgomp.c++/atomic-12.C: New test. * testsuite/libgomp.c++/atomic-13.C: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196815 138bc75d-0d04-0410-961f-82ee72b054a4

with default value, pass it down to c_parser_conditional_expression. (c_parser_conditional_expression): Add omp_atomic_lhs argument, pass it down to c_parser_binary_expression. Don't pass PREC_NONE to it. Adjust recursive call. (c_parser_binary_expression): Remove prec argument, add omp_atomic_lhs argument. Always start from PREC_NONE, if omp_atomic_lhs is non-NULL and one of the arguments of toplevel binop matches it, use build2 instead of parser_build_binary_op. (c_parser_omp_atomic): Handle OpenMP 4.0 atomics. (c_parser_omp_for_loop): Adjust c_parser_binary_expression caller. * c-tree.h (c_tree_equal): New prototype. * c-typeck.c (c_tree_equal): New function. * parser.c (cp_parser_omp_atomic): Never restart unless structured_block is true. * c-c++-common/gomp/atomic-15.c: Adjust for C diagnostics. * testsuite/libgomp.c/atomic-14.c: Add parens to make it valid. * testsuite/libgomp.c/atomic-15.c: New test. * testsuite/libgomp.c/atomic-16.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196816 138bc75d-0d04-0410-961f-82ee72b054a4

* env.c (handle_omp_display_env): New function. (initialize_env): Use it. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196817 138bc75d-0d04-0410-961f-82ee72b054a4

* libgomp.texi (Environment Variables): Minor cleanup, update section refs to OpenMP 4.0rc2. (OMP_DISPLAY_ENV, GOMP_SPINCOUNT): Document these environment variables. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196818 138bc75d-0d04-0410-961f-82ee72b054a4

GIMPLE_OMP_FOR kinds. * tree.def (OMP_SIMD, OMP_FOR_SIMD, OMP_DISTRIBUTE): New tree codes. * gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_MASK, GF_OMP_FOR_KIND_FOR, GF_OMP_FOR_KIND_SIMD, GF_OMP_FOR_KIND_FOR_SIMD and GF_OMP_FOR_KIND_DISTRIBUTE. (gimple_omp_for_kind, gimple_omp_for_set_kind): New inline functions. * gimplify.c (is_gimple_stmt, gimplify_omp_for, gimplify_expr): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. * tree.c (omp_clause_num_ops, omp_clause_code_name, walk_tree_1): Handle new OpenMP 4.0 clauses. * tree-pretty-print.c (dump_omp_clause): Likewise. (dump_generic_node): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. * tree.h (enum omp_clause_code): Add OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED, OMP_CLAUSE_DEPEND, OMP_CLAUSE_FROM, OMP_CLAUSE_TO, OMP_CLAUSE_UNIFORM, OMP_CLAUSE_MAP, OMP_CLAUSE_DEVICE, OMP_CLAUSE_DIST_SCHEDULE, OMP_CLAUSE_INBRANCH, OMP_CLAUSE_NOTINBRANCH, OMP_CLAUSE_NUM_TEAMS, OMP_CLAUSE_PROC_BIND, OMP_CLAUSE_SAFELEN, OMP_CLAUSE_SIMDLEN, OMP_CLAUSE_FOR, OMP_CLAUSE_PARALLEL, OMP_CLAUSE_SECTIONS and OMP_CLAUSE_TASKGROUP. (OMP_LOOP_CHECK): Define. (OMP_FOR_BODY, OMP_FOR_CLAUSES, OMP_FOR_INIT, OMP_FOR_COND, OMP_FOR_INCR, OMP_FOR_PRE_BODY): Use OMP_LOOP_CHECK instead of OMP_FOR_CHECK. (OMP_CLAUSE_DECL): Extend check range up to OMP_CLAUSE_MAP. (OMP_CLAUSE_LINEAR_STEP, OMP_CLAUSE_ALIGNED_ALIGNMENT, OMP_CLAUSE_NUM_TEAMS_EXPR, OMP_CLAUSE_DEVICE_ID, OMP_CLAUSE_DIST_SCHEDULE_CHUNK_EXPR, OMP_CLAUSE_SAFELEN_EXPR, OMP_CLAUSE_SIMDLEN_EXPR): Define. (enum omp_clause_depend_kind, enum omp_clause_map_kind, enum omp_clause_proc_bind_kind): New enums. (OMP_CLAUSE_DEPEND_KIND, OMP_CLAUSE_MAP_KIND, OMP_CLAUSE_PROC_BIND_KIND): Define. (struct tree_omp_clause): Add subcode.depend_kind, subcode.map_kind and subcode.proc_bind_kind. (find_omp_clause): New prototype. * omp-builtins.def (BUILT_IN_GOMP_CANCEL, BUILT_IN_GOMP_CANCELLATION_POINT): New built-ins. * tree-flow.h (find_omp_clause): Remove prototype. c/ * c-parser.c (c_parser_omp_all_clauses): Change mask argument type from unsigned to omp_clause_mask. (c_parser_omp_for_loop): Adjust c_finish_omp_for caller. (OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK, OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK, OMP_TASK_CLAUSE_MASK): Use OMP_CLAUSE_MASK_1 instead of 1. (c_parser_omp_parallel): Use omp_clause_mask type instead of unsigned for mask, use OMP_CLAUSE_MASK_1 instead of 1 for masks. cp/ * cp-tree.h (OMP_FOR_GIMPLIFYING_P): Use OMP_LOOP_CHECK instead of OMP_FOR_CHECK. (finish_omp_for): Add enum tree_code second argument. (finish_omp_cancel, finish_omp_cancellation_point): New prototypes. * cp-gimplify.c (cp_gimplify_expr, cp_genericize_r): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. * semantics.c (finish_omp_clauses): Handle new OpenMP 4.0 clauses. (finish_omp_for): Add code argument, pass it down to make_node or c_finish_omp_for. (finish_omp_cancel, finish_omp_cancellation_point): New functions. * parser.c (cp_parser_omp_clause_name): Add parsing of new OpenMP 4.0 clauses. (cp_parser_omp_var_list_no_open): Add COLON argument, if non-NULL, accept termination by colon instead of closing paren. (cp_parser_omp_var_list, cp_parser_omp_clause_reduction): Adjust callers. (cp_parser_omp_clause_branch, cp_parser_omp_clause_cancelkind, cp_parser_omp_clause_num_teams, cp_parser_omp_clause_aligned, cp_parser_omp_clause_linear, cp_parser_omp_clause_depend, cp_parser_omp_clause_map, cp_parser_omp_clause_device, cp_parser_omp_clause_dist_schedule, cp_parser_omp_clause_proc_bind): New functions. (cp_parser_omp_all_clauses): Change mask argument's type to omp_clause_mask from unsigned. Fix c_name for PRAGMA_OMP_CLAUSE_UNTIED. Handle new OpenMP 4.0 clauses. (cp_parser_omp_for_loop): Add code argument. Pass it down to finish_omp_for. (OMP_SIMD_CLAUSE_MASK): Define. (cp_parser_omp_simd): New function. (OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK, OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK, OMP_TASK_CLAUSE_MASK): Use OMP_CLAUSE_MASK_1 instead of 1. (cp_parser_omp_for): Handle parsing of #pragma omp for simd. (cp_parser_omp_parallel): Handle parsing of #pragma omp parallel for simd. Use omp_clause_mask type instead of unsigned for mask, use OMP_CLAUSE_MASK_1 instead of 1 for masks. (OMP_CANCEL_CLAUSE_MASK, OMP_CANCELLATION_POINT_CLAUSE_MASK): Define. (cp_parser_omp_cancel, cp_parser_omp_cancellation_point): New functions. (cp_parser_omp_construct): Handle PRAGMA_OMP_SIMD, PRAGMA_OMP_CANCEL and PRAGMA_OMP_CANCELLATION_POINT. (cp_parser_pragma): Handle PRAGMA_OMP_SIMD. * pt.c (tsubst_expr): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. Pass down TREE_CODE to finish_omp_for. fortran/ * f95-lang.c (ATTR_NULL): Define. c-family/ * c-omp.c (c_finish_omp_for): Add code argument, pass it down to make_code. (c_split_parallel_clauses): Handle OMP_CLAUSE_SAFELEN, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_LINEAR. * c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_CANCEL, PRAGMA_OMP_CANCELLATION_POINT, PRAGMA_OMP_DECLARE_REDUCTION, PRAGMA_OMP_DECLARE_SIMD, PRAGMA_OMP_DECLARE_TARGET, PRAGMA_OMP_DISTRIBUTE, PRAGMA_OMP_END_DECLARE_TARGET, PRAGMA_OMP_FOR_SIMD, PRAGMA_OMP_PARALLEL_FOR_SIMD, PRAGMA_OMP_SIMD, PRAGMA_OMP_TARGET, PRAGMA_OMP_TARGET_DATA, PRAGMA_OMP_TARGET_UPDATE, PRAGMA_OMP_TASKGROUP and PRAGMA_OMP_TEAMS. (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_ALIGNED, PRAGMA_OMP_CLAUSE_DEPEND, PRAGMA_OMP_CLAUSE_DEVICE, PRAGMA_OMP_CLAUSE_DIST_SCHEDULE, PRAGMA_OMP_CLAUSE_FOR, PRAGMA_OMP_CLAUSE_FROM, PRAGMA_OMP_CLAUSE_INBRANCH, PRAGMA_OMP_CLAUSE_LINEAR, PRAGMA_OMP_CLAUSE_MAP, PRAGMA_OMP_CLAUSE_NOTINBRANCH, PRAGMA_OMP_CLAUSE_NUM_TEAMS, PRAGMA_OMP_CLAUSE_PARALLEL, PRAGMA_OMP_CLAUSE_PROC_BIND, PRAGMA_OMP_CLAUSE_SAFELEN, PRAGMA_OMP_CLAUSE_SECTIONS, PRAGMA_OMP_CLAUSE_SIMDLEN, PRAGMA_OMP_CLAUSE_TASKGROUP, PRAGMA_OMP_CLAUSE_TO and PRAGMA_OMP_CLAUSE_UNIFORM. * c-pragma.c (omp_pragmas): Add new OpenMP 4.0 constructs. * c-common.h (c_finish_omp_for): Add enum tree_code as second argument. (OMP_CLAUSE_MASK_1): Define. (omp_clause_mask): For HWI >= 64 new typedef for unsigned HOST_WIDE_INT, otherwise a class with needed ctors and operators. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197161 138bc75d-0d04-0410-961f-82ee72b054a4

OMP_SIMD and OMP_FOR_SIMD loops. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197515 138bc75d-0d04-0410-961f-82ee72b054a4

omp_get_proc_bind, omp_get_proc_bind_, omp_set_default_device, omp_set_default_device_, omp_set_default_device_8_, omp_get_default_device, omp_get_default_device_, omp_get_num_devices, omp_get_num_devices_, omp_get_num_teams, omp_get_num_teams_, omp_get_team_num, omp_get_team_num_): Export @@OMP_4.0. (GOMP_cancel, GOMP_cancellation_point, GOMP_parallel_loop_dynamic, GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime, GOMP_parallel_loop_static, GOMP_parallel_sections, GOMP_parallel, GOMP_taskgroup_start, GOMP_taskgroup_end): Export @@GOMP_4.0. * parallel.c (GOMP_parallel_end): Add ialias. (GOMP_parallel, GOMP_cancel, GOMP_cancellation_point): New functions. * omp.h.in (omp_proc_bind_t): New typedef. (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New prototypes. * env.c (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New functions. * fortran.c (ULP, STR1, STR2, ialias_redirect): Removed. (omp_get_cancellation_, omp_get_proc_bind_, omp_set_default_device_, omp_set_default_device_8_, omp_get_default_device_, omp_get_num_devices_, omp_get_num_teams_, omp_get_team_num_): New functions. * libgomp.h (ialias_ulp, ialias_str1, ialias_str2, ialias_redirect, ialias_call): Define. * libgomp_g.h (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic, GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime, GOMP_parallel, GOMP_cancel, GOMP_cancellation_point, GOMP_taskgroup_start, GOMP_taskgroup_end, GOMP_parallel_sections): New prototypes. * task.c (GOMP_taskgroup_start, GOMP_taskgroup_end): New functions. * sections.c (GOMP_parallel_sections): New function. * loop.c (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic, GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime): New functions. (GOMP_parallel_end): Add ialias_redirect. * omp_lib.f90.in (omp_proc_bind_kind, omp_proc_bind_false, omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close, omp_proc_bind_spread): New params. (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New interfaces. * omp_lib.h.in (omp_proc_bind_kind, omp_proc_bind_false, omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close, omp_proc_bind_spread): New params. (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New externals. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197670 138bc75d-0d04-0410-961f-82ee72b054a4

(BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove. (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New. * gimplify.c (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_PROC_BIND. * omp-builtins.def (BUILT_IN_GOMP_TASKGROUP_START, BUILT_IN_GOMP_TASKGROUP_END, BUILT_IN_GOMP_PARALLEL_LOOP_STATIC, BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC, BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED, BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME, BUILT_IN_GOMP_PARALLEL, BUILT_IN_GOMP_PARALLEL_SECTIONS): New built-ins. (BUILT_IN_GOMP_PARALLEL_LOOP_STATIC_START, BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC_START, BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED_START, BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME_START, BUILT_IN_GOMP_PARALLEL_START, BUILT_IN_GOMP_PARALLEL_END, BUILT_IN_GOMP_PARALLEL_SECTIONS_START): Remove. * omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_PROC_BIND. (expand_parallel_call): Expand #pragma omp parallel* as calls to the new GOMP_parallel_* APIs without _start at the end, instead of GOMP_parallel_*_start followed by fn.omp_fn.N call, followed by GOMP_parallel_end. Handle OMP_CLAUSE_PROC_BIND. * tree-ssa-alias.c (ref_maybe_used_by_call_p_1, call_may_clobber_ref_p_1): Handle BUILT_IN_GOMP_TASKGROUP_END instead of BUILT_IN_GOMP_PARALLEL_END. c-family/ * c-common.c (DEF_FUNCTION_TYPE_8): Define. * c-omp.c (c_split_parallel_clauses): Handle OMP_CLAUSE_PROC_BIND. cp/ * cp-tree.h (finish_omp_taskgroup): New prototype. * parser.c (cp_parser_omp_clause_proc_bind): Require ) instead of colon at the end of the clause. (cp_parser_omp_taskgroup): New function. (cp_parser_omp_construct, cp_parser_pragma): Handle PRAGMA_OMP_TASKGROUP. * semantics.c (finish_omp_taskgroup): New function. fortran/ * f95-lang.c (DEF_FUNCTION_TYPE_8): Define. * types.def (DEF_FUNCTION_TYPE_8): Document. (BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove. (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New. ada/ * gcc-interface/utils.c (DEF_FUNCTION_TYPE_8): Define. lto/ * lto-lang.c (DEF_FUNCTION_TYPE_8): Define. testsuite/ * gcc.dg/gomp/combined-1.c: Look for GOMP_parallel_loop_runtime instead of GOMP_parallel_loop_runtime_start. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197676 138bc75d-0d04-0410-961f-82ee72b054a4

OMP_CLAUSE_LINEAR_NO_COPYOUT): Define. * omp-low.c (extract_omp_for_data): Handle #pragma omp simd. (build_outer_var_ref): For #pragma omp simd allow linear etc. clauses to bind even to private vars. (scan_sharing_clauses): Handle OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_SAFELEN. (lower_rec_input_clauses): Handle OMP_CLAUSE_LINEAR. Don't emit a GOMP_barrier call for firstprivate/lastprivate in #pragma omp simd. (lower_lastprivate_clauses): Handle also OMP_CLAUSE_LINEAR. (expand_omp_simd): New function. (expand_omp_for): Handle #pragma omp simd. * gimplify.c (enum gimplify_omp_var_data): Add GOVD_LINEAR and GOVD_ALIGNED, add GOVD_LINEAR into GOVD_DATA_SHARE_CLASS. (enum omp_region_type): Add ORT_SIMD. (gimple_add_tmp_var, gimplify_var_or_parm_decl, omp_check_private, omp_firstprivatize_variable, omp_notice_variable): Handle ORT_SIMD like ORT_WORKSHARE. (omp_is_private): Likewise. Add SIMD argument, tweak diagnostics and add extra errors in simd constructs. (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_SAFELEN. (gimplify_adjust_omp_clauses_1): Handle GOVD_LASTPRIVATE and GOVD_ALIGNED. (gimplify_omp_for): Handle #pragma omp simd. cp/ * cp-tree.h (CP_OMP_CLAUSE_INFO): Also allow it on OMP_CLAUSE_LINEAR. * parser.c (cp_parser_omp_var_list_no_open): If colon is non-NULL, temporarily disable colon_corrects_to_scope_p during the parsing of the variable list. (cp_parser_omp_clause_safelen, cp_parser_omp_clause_simdlen): New functions. (cp_parser_omp_all_clauses): Handle OMP_CLAUSE_SAFELEN and OMP_CLAUSE_SIMDLEN. * semantics.c (finish_omp_clauses): Allow NULL_TREE in OMP_CLAUSE_ALIGNED_ALIGNMENT. testsuite/ * c-c++-common/gomp/simd1.c: New test. * c-c++-common/gomp/simd2.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198092 138bc75d-0d04-0410-961f-82ee72b054a4

OMP_SIMD infrastructure.

* gimplify.c (gimplify_adjust_omp_clauses): For linear clauses if outer_context is non-NULL, but not ORT_COMBINED_PARALLEL, call omp_notice_variable. Remove aligned clauses that can't be handled yet. * omp-low.c: Include target.h. (scan_sharing_clauses): For aligned clauses with global arrays register local replacement. (omp_clause_aligned_alignment): New function. (lower_rec_input_clauses): For aligned clauses for global arrays or automatic pointers emit __builtin_assume_aligned before the loop if possible. (expand_omp_regimplify_p, expand_omp_build_assign): New functions. (expand_omp_simd): Use them. Handle pointer iterators and broken loops. (lower_omp_for): Call lower_omp on gimple_omp_body_ptr after calling lower_rec_input_clauses, not before it. cp/ * semantics.c (finish_omp_clauses): On OMP_CLAUSE_LINEAR clauses verify OMP_CLAUSE_DECL has integral or pointer type, and handle linear steps for pointer type decls. FIx up handling of OMP_CLAUSE_UNIFORM. testsuite/ * c-c++-common/gomp/simd3.c: New test. * c-c++-common/gomp/simd4.c: New test. * c-c++-common/gomp/simd5.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198193 138bc75d-0d04-0410-961f-82ee72b054a4

Conflicts: gcc/omp-low.c

* c-parser.c (c_parser_compound_statement, c_parser_statement): Adjust comments for OpenMP 3.0+ additions. (c_parser_pragma): Handle PRAGMA_OMP_CANCEL and PRAGMA_OMP_CANCELLATION_POINT. (c_parser_omp_clause_name): Handle new OpenMP 4.0 clauses. (c_parser_omp_clause_collapse): Fully fold collapse expression. (c_parser_omp_clause_branch, c_parser_omp_clause_cancelkind, c_parser_omp_clause_num_teams, c_parser_omp_clause_aligned, c_parser_omp_clause_linear, c_parser_omp_clause_safelen, c_parser_omp_clause_simdlen, c_parser_omp_clause_depend, c_parser_omp_clause_map, c_parser_omp_clause_device, c_parser_omp_clause_dist_schedule, c_parser_omp_clause_proc_bind, c_parser_omp_clause_to, c_parser_omp_clause_from, c_parser_omp_clause_uniform): New functions. (c_parser_omp_all_clauses): Handle new OpenMP 4.0 clauses. (c_parser_omp_for_loop): Add CODE argument, pass it through to c_finish_omp_for. (OMP_SIMD_CLAUSE_MASK): Define. (c_parser_omp_simd): New function. (c_parser_omp_for): Parse #pragma omp for simd. (OMP_PARALLEL_CLAUSE_MASK): Add OMP_CLAUSE_PROC_BIND. (c_parser_omp_parallel): Parse #pragma omp parallel for simd. (OMP_TASK_CLAUSE_MASK): Add OMP_CLAUSE_DEPEND. (c_parser_omp_taskgroup): New function. (OMP_CANCEL_CLAUSE_MASK, OMP_CANCELLATION_POINT_CLAUSE_MASK): Define. (c_parser_omp_cancel, c_parser_omp_cancellation_point): New functions. (c_parser_omp_construct): Handle PRAGMA_OMP_SIMD and PRAGMA_OMP_TASKGROUP. (c_parser_transaction_cancel): Formatting fix. * c-tree.h (c_begin_omp_taskgroup, c_finish_omp_taskgroup, c_finish_omp_cancel, c_finish_omp_cancellation_point): New prototypes. * c-typeck.c (c_begin_omp_taskgroup, c_finish_omp_taskgroup, c_finish_omp_cancel, c_finish_omp_cancellation_point): New functions. (c_finish_omp_clauses): Handle new OpenMP 4.0 clauses. cp/ * parser.c (cp_parser_omp_clause_name): Add missing break after case 'i'. (cp_parser_omp_cancellation_point): Diagnose error if #pragma omp cancellation isn't followed by point. * semantics.c (finish_omp_clauses): Complain also about zero in alignment of aligned directive or safelen/simdlen expressions. (finish_omp_cancel): Fix up diagnostics wording. testsuite/ * c-c++-common/gomp/simd1.c: Enable also for C. * c-c++-common/gomp/simd2.c: Likewise. * c-c++-common/gomp/simd3.c: Likewise. * c-c++-common/gomp/simd4.c: Likewise. Adjust expected diagnostics for C. * c-c++-common/gomp/simd5.c: Enable also for C. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198264 138bc75d-0d04-0410-961f-82ee72b054a4

present.

OpenMP constructs nested inside simd region. Don't treat #pragma omp simd as work-sharing region. Disallow work-sharing constructs inside of critical region. Complain if ordered region is nested inside of parallel region without loop region in between. (scan_omp_1_stmt): Call check_omp_nesting_restrictions even for GOMP_{cancel{,lation_point},taskyield,taskwait} calls. * gfortran.dg/gomp/appendix-a/a.35.5.f90: Add dg-error. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198459 138bc75d-0d04-0410-961f-82ee72b054a4

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198460 138bc75d-0d04-0410-961f-82ee72b054a4

dump_gimple_omp_atomic_store): Handle gimple_omp_atomic_seq_cst_p. * gimple.h (enum gf_mask): Add GF_OMP_ATOMIC_SEQ_CST. (gimple_omp_atomic_set_seq_cst, gimple_omp_atomic_seq_cst_p): New inline functions. * omp-low.c (expand_omp_atomic_load, expand_omp_atomic_store, expand_omp_atomic_fetch_op): If gimple_omp_atomic_seq_cst_p, pass MEMMODEL_SEQ_CST instead of MEMMODEL_RELAXED to the builtin. * gimplify.c (gimplify_omp_atomic): Handle OMP_ATOMIC_SEQ_CST. * tree-pretty-print.c (dump_generic_node): Handle OMP_ATOMIC_SEQ_CST. * tree.def (OMP_ATOMIC): Add comment that OMP_ATOMIC* must stay consecutive. * tree.h (OMP_ATOMIC_SEQ_CST): Define. c/ * c-parser.c (c_parser_omp_atomic): Parse seq_cst clause, pass true if it is present to c_finish_omp_atomic. cp/ * pt.c (tsubst_expr): Pass OMP_ATOMIC_SEQ_CST to finish_omp_atomic. * semantics.c (finish_omp_atomic): Add seq_cst argument, pass it through to c_finish_omp_atomic or store into OMP_ATOMIC_SEQ_CST. * cp-tree.h (finish_omp_atomic): Adjust prototype. * parser.c (cp_parser_omp_atomic): Parse seq_cst clause, pass true if it is present to finish_omp_atomic. c-family/ * c-omp.c (c_finish_omp_atomic): Add seq_cst argument, store it into OMP_ATOMIC_SEQ_CST bit. * c-common.h (c_finish_omp_atomic): Adjust prototype. testsuite/ * testsuite/libgomp.c/atomic-17.c: New test. * testsuite/libgomp.c++/atomic-14.C: New test. * testsuite/libgomp.c++/atomic-15.C: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198461 138bc75d-0d04-0410-961f-82ee72b054a4

Remove deprecated vectorlength clause features. Remove deprecated assert and noassert clauses. Implement vectorlength clause in OpenMP safelen terms.

(attribute_value_equal): Call it for -fopenmp if TREE_VALUE of the attributes are both OMP_CLAUSEs. * tree.h (omp_declare_simd_clauses_equal): Declare. c-family/ * c-common.c (c_common_attribute_table): Add "omp declare simd" attribute. (handle_omp_declare_simd_attribute): New function. * c-common.h (c_omp_declare_simd_clauses_to_numbers, c_omp_declare_simd_clauses_to_decls): Declare. * c-omp.c (c_omp_declare_simd_clause_cmp, c_omp_declare_simd_clauses_to_numbers, c_omp_declare_simd_clauses_to_decls): New functions. cp/ * cp-tree.h (cp_decl_specifier_seq): Add omp_declare_simd_clauses field. (finish_omp_declare_simd): Declare. * decl2.c (is_late_template_attribute): Return true for "omp declare simd" attribute. (cp_check_const_attributes): Don't check TREE_VALUE of arg if arg isn't a TREE_LIST. * decl.c (grokfndecl): Add omp_declare_simd_clauses argument, call finish_omp_declare_simd if non-NULL. (grokdeclarator): Pass it declspecs->omp_declare_simd_clauses to grokfndecl. * pt.c (apply_late_template_attributes): Handle "omp declare simd" attribute specially. (tsubst_omp_clauses): Add declare_simd argument, don't call finish_omp_clauses if it is set. Handle OpenMP 4.0 clauses. (tsubst_expr): Adjust tsubst_omp_clauses callers. * semantics.c (finish_omp_clauses): Diagnose inbranch notinbranch. (finish_omp_declare_simd): New function. * parser.h (struct cp_parser): Add omp_declare_simd_clauses field. * parser.c (cp_ensure_no_omp_declare_simd, cp_finish_omp_declare_simd): New functions. (enum pragma_context): Add pragma_member and pragma_objc_icode. (cp_parser_linkage_specification, cp_parser_namespace_definition, cp_parser_class_specifier_1): Call cp_ensure_no_omp_declare_simd. (cp_parser_init_declarator, cp_parser_member_declaration, cp_parser_function_definition_from_specifiers_and_declarator, cp_parser_save_member_function_body): Copy parser->omp_declare_simd_clauses to decl_specifiers->omp_declare_simd_clauses, call cp_finish_omp_declare_simd. (cp_parser_member_specification_opt): Pass pragma_member instead of pragma_external to cp_parser_pragma. (cp_parser_objc_interstitial_code): Pass pragma_objc_icode instead of pragma_external to cp_parser_pragma. (cp_parser_omp_var_list_no_open): If parser->omp_declare_simd_clauses, just cp_parser_identifier the argument names. (cp_parser_omp_all_clauses): Don't call finish_omp_clauses for parser->omp_declare_simd_clauses. (OMP_DECLARE_SIMD_CLAUSE_MASK): Define. (cp_parser_omp_declare_simd, cp_parser_omp_declare): New functions. (cp_parser_pragma): Call cp_ensure_no_omp_declare_simd. Handle PRAGMA_OMP_DECLARE_REDUCTION. Replace == pragma_external with != pragma_stmt and != pragma_compound. testsuite/ * g++.dg/gomp/declare-simd-1.C: New test. * g++.dg/gomp/declare-simd-2.C: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198739 138bc75d-0d04-0410-961f-82ee72b054a4

* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_LINEAR_STEP adjustments for pointer-types here. Diagnose inbranch notinbranch being used together. (c_finish_omp_declare_simd): New function. * c-parser.c (enum pragma_context): Add pragma_struct and pragma_param. (c_parser_declaration_or_fndef): Add omp_declare_simd_clauses argument. Call c_finish_omp_declare_simd if needed. (c_parser_external_declaration, c_parser_compound_statement_nostart, c_parser_label, c_parser_for_statement, c_parser_objc_methodprotolist, c_parser_omp_for_loop): Adjust c_parser_declaration_or_fndef callers. (c_parser_struct_or_union_specifier): Use pragma_struct instead of pragma_external. (c_parser_parameter_declaration): Use pragma_param instead of pragma_external. (c_parser_pragma): Handle PRAGMA_OMP_DECLARE_REDUCTION. Replace == pragma_external with != pragma_stmt && != pragma_compound test. (c_parser_omp_variable_list): Add declare_simd argument. Don't lookup vars if it is true, just store identifiers. (c_parser_omp_var_list_parens, c_parser_omp_clause_depend, c_parser_omp_clause_map): Adjust callers. (c_parser_omp_clause_reduction, c_parser_omp_clause_aligned): Add declare_simd argument, pass it through to c_parser_omp_variable_list. (c_parser_omp_clause_linear): Likewise. Don't handle OMP_CLAUSE_LINEAR_STEP adjustements for pointer-types here. (c_parser_omp_clause_uniform): Call c_parser_omp_variable_list instead of c_parser_omp_var_list_parens to pass true as declare_simd. (c_parser_omp_all_clauses): Add declare_simd argument, pass it through clause parsing routines as needed. Don't call c_finish_omp_clauses if set. (c_parser_omp_simd, c_parser_omp_for, c_parser_omp_sections, c_parser_omp_parallel, c_parser_omp_single, c_parser_omp_task, c_parser_omp_cancel, c_parser_omp_cancellation_point): Adjust callers. (OMP_DECLARE_SIMD_CLAUSE_MASK): Define. (c_parser_omp_declare_simd, c_parser_omp_declare): New functions. * gcc.dg/gomp/declare-simd-1.c: New test. * gcc.dg/gomp/declare-simd-2.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198828 138bc75d-0d04-0410-961f-82ee72b054a4

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198835 138bc75d-0d04-0410-961f-82ee72b054a4

I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X##1; \ template <class U> struct X#gcc-mirror#2; \ template <class U> struct X#gcc-mirror#3; \ template <class U> struct X#gcc-mirror#4; \ template <class U> struct X#gcc-mirror#5; \ template <class U> struct X#gcc-mirror#6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.

This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.

This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A #4, R12 { RRAM.A #4, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.

Here we have template<class T> auto is_throwable(T t) -> decltype(throw t, true) { ... } where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused the wrong overload to have been chosen. Jason figured out it's because we don't correctly implement [expr.prim.id.unqual]#4.2, which post-P2266 says that an id-expression is move-eligible if "the id-expression (possibly parenthesized) is the operand of a throw-expression, and names an implicitly movable entity that belongs to a scope that does not contain the compound-statement of the innermost lambda-expression, try-block, or function-try-block (if any) whose compound-statement or ctor-initializer contains the throw-expression." I worked out that it's trying to say that given struct X { X(); X(const X&); X(X&&) = delete; }; the following should fail: the scope of the throw is an sk_try, and it's also x's scope S, and S "does not contain the compound-statement of the *try-block" so x is move-eligible, so we move, so we fail. void f () try { X x; throw x; // use of deleted function } catch (...) { } Whereas here: void g (X x) try { throw x; } catch (...) { } the throw is again in an sk_try, but x's scope is an sk_function_parms which *does* contain the {} of the *try-block, so x is not move-eligible, so we don't move, so we use X(const X&), and the code is fine. The current code also doesn't seem to handle void h (X x) { void z (decltype(throw x, true)); } where there's no enclosing lambda or sk_try so we should move. I'm not doing anything about lambdas because we shouldn't reach the code at the end of the function: the DECL_HAS_VALUE_EXPR_P check shouldn't let us go further. PR c++/113789 PR c++/113853 gcc/cp/ChangeLog: * typeck.cc (treat_lvalue_as_rvalue_p): Update code to better reflect [expr.prim.id.unqual]#4.2. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/sfinae69.C: Remove dg-bogus. * g++.dg/cpp0x/sfinae70.C: New test. * g++.dg/cpp0x/sfinae71.C: New test. * g++.dg/cpp0x/sfinae72.C: New test. * g++.dg/cpp2a/implicit-move4.C: New test.

I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X#gcc-mirror#1; \ template <class U> struct X#gcc-mirror#2; \ template <class U> struct X#gcc-mirror#3; \ template <class U> struct X#gcc-mirror#4; \ template <class U> struct X#gcc-mirror#5; \ template <class U> struct X#gcc-mirror#6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.

This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.

Here we have template<class T> auto is_throwable(T t) -> decltype(throw t, true) { ... } where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused the wrong overload to have been chosen. Jason figured out it's because we don't correctly implement [expr.prim.id.unqual]gcc-mirror#4.2, which post-P2266 says that an id-expression is move-eligible if "the id-expression (possibly parenthesized) is the operand of a throw-expression, and names an implicitly movable entity that belongs to a scope that does not contain the compound-statement of the innermost lambda-expression, try-block, or function-try-block (if any) whose compound-statement or ctor-initializer contains the throw-expression." I worked out that it's trying to say that given struct X { X(); X(const X&); X(X&&) = delete; }; the following should fail: the scope of the throw is an sk_try, and it's also x's scope S, and S "does not contain the compound-statement of the *try-block" so x is move-eligible, so we move, so we fail. void f () try { X x; throw x; // use of deleted function } catch (...) { } Whereas here: void g (X x) try { throw x; } catch (...) { } the throw is again in an sk_try, but x's scope is an sk_function_parms which *does* contain the {} of the *try-block, so x is not move-eligible, so we don't move, so we use X(const X&), and the code is fine. The current code also doesn't seem to handle void h (X x) { void z (decltype(throw x, true)); } where there's no enclosing lambda or sk_try so we should move. I'm not doing anything about lambdas because we shouldn't reach the code at the end of the function: the DECL_HAS_VALUE_EXPR_P check shouldn't let us go further. PR c++/113789 PR c++/113853 gcc/cp/ChangeLog: * typeck.cc (treat_lvalue_as_rvalue_p): Update code to better reflect [expr.prim.id.unqual]gcc-mirror#4.2. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/sfinae69.C: Remove dg-bogus. * g++.dg/cpp0x/sfinae70.C: New test. * g++.dg/cpp0x/sfinae71.C: New test. * g++.dg/cpp0x/sfinae72.C: New test. * g++.dg/cpp2a/implicit-move4.C: New test.

This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.

fixing tests and removing C++20 requirement

Here during overload resolution we have two strictly viable ambiguous candidates #1 and #2, and two non-strictly viable candidates #3 and #4 which we hold on to ever since r14-6522. These latter candidates have an empty second arg conversion since the first arg conversion was deemed bad, and this trips up joust when called on #3 and #4 which assumes all arg conversions are there. We can fix this by making joust robust to empty arg conversions, but in this situation we shouldn't need to compare #3 and #4 at all given that we have a strictly viable candidate. To that end, this patch makes tourney shortcut considering non-strictly viable candidates upon encountering ambiguity between two strictly viable candidates (taking advantage of the fact that the candidates list is sorted according to viability via splice_viable). PR c++/115239 gcc/cp/ChangeLog: * call.cc (tourney): Don't consider a non-strictly viable candidate as the champ if there was ambiguity between two strictly viable candidates. gcc/testsuite/ChangeLog: * g++.dg/overload/error7.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>

Here during overload resolution we have two strictly viable ambiguous candidates #1 and #2, and two non-strictly viable candidates #3 and #4 which we hold on to ever since r14-6522. These latter candidates have an empty second arg conversion since the first arg conversion was deemed bad, and this trips up joust when called on #3 and #4 which assumes all arg conversions are there. We can fix this by making joust robust to empty arg conversions, but in this situation we shouldn't need to compare #3 and #4 at all given that we have a strictly viable candidate. To that end, this patch makes tourney shortcut considering non-strictly viable candidates upon encountering ambiguity between two strictly viable candidates (taking advantage of the fact that the candidates list is sorted according to viability via splice_viable). PR c++/115239 gcc/cp/ChangeLog: * call.cc (tourney): Don't consider a non-strictly viable candidate as the champ if there was ambiguity between two strictly viable candidates. gcc/testsuite/ChangeLog: * g++.dg/overload/error7.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com> (cherry picked from commit 7fed7e9)

These tests used to generate: bl swap ldr r2, [sp, #4] mov r0, r2 @ __fp16 but g:9d20529d94b23275885f380d155fe8671ab5353a means that we can load directly into r0: bl swap ldrh r0, [sp, #4] @ __fp16 This patch updates the tests to "defend" this change. While there, the scans include: mov\tr1, r[03]} But if the spill of r2 occurs first, there's no real reason why r2 couldn't be used as the temporary, instead r3. The patch tries to update the scans while preserving the spirit of the originals. gcc/testsuite/ * gcc.target/arm/fp16-aapcs-2.c: Expect the return value to be loaded directly from the stack. Test that the swap generates two moves out of r0/r1 and two moves in. * gcc.target/arm/fp16-aapcs-4.c: Likewise.

…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case.

…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case. (cherry picked from commit bb8dd09)

Whenever C1 and C2 are integer constants, X is of a wrapping type, and cmp is a relational operator, the expression X +- C1 cmp C2 can be simplified in the following cases: (a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial comparison in the following way: X +- C1 <= C2 -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1) -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions) -INF -+ C1 <= X <= +INF (due to (1)) -INF -+ C1 <= X (eliminate the right hand side since it holds for any X) (b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following sequence of transformations: X +- C1 >= C2 +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1) +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions) +INF -+ C1 >= X >= -INF (due to (1)) +INF -+ C1 >= X (eliminate the right hand side since it holds for any X) (c) The > and < cases are negations of (a) and (b), respectively. This transformation allows to occasionally save add / sub instructions, for instance the expression 3 + (uint32_t)f() < 2 compiles to cmn w0, #4 cset w0, ls instead of add w0, w0, 3 cmp w0, 2 cset w0, ls on aarch64. Testcases that go together with this patch have been split into two separate files, one containing testcases for unsigned variables and the other for wrapping signed ones (and thus compiled with -fwrapv). Additionally, one aarch64 test has been adjusted since the patch has caused the generated code to change from cmn w0, #2 csinc w0, w1, wzr, cc (x < -2) to cmn w0, #3 csinc w0, w1, wzr, cs (x <= -3) This patch has been bootstrapped and regtested on aarch64, x86_64, and i386, and additionally regtested on riscv32. gcc/ChangeLog: PR tree-optimization/116024 * match.pd: New transformation around integer comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr116024-2.c: New test. * gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto. * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.

Update test case for armv8.1-m.main that supports conditional arithmetic. armv7-m: push {r4, lr} ldr r4, .L6 ldr r4, [r4] lsls r4, r4, #29 it mi addmi r2, r2, #1 bl bar movs r0, #0 pop {r4, pc} armv8.1-m.main: push {r3, r4, r5, lr} ldr r4, .L5 ldr r5, [r4] tst r5, #4 csinc r2, r2, r2, eq bl bar movs r0, #0 pop {r3, r4, r5, pc} gcc/testsuite/ChangeLog: * gcc.target/arm/epilog-1.c: Use check-function-bodies. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

Update test case for armv8.1-m.main that supports conditional arithmetic. armv7-m: push {r4, lr} ldr r4, .L6 ldr r4, [r4] lsls r4, r4, #29 it mi addmi r2, r2, #1 bl bar movs r0, #0 pop {r4, pc} armv8.1-m.main: push {r3, r4, r5, lr} ldr r4, .L5 ldr r5, [r4] tst r5, #4 csinc r2, r2, r2, eq bl bar movs r0, #0 pop {r3, r4, r5, pc} gcc/testsuite/ChangeLog: * gcc.target/arm/epilog-1.c: Use check-function-bodies. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com> (cherry picked from commit ec86e87)

In r14.2.0-376-g724446556e5, I accidentally introduced a regression in the expected assembler as the csinc instruction was not used for armv8.1-m.main. The generated assembler for armv8.1-m.main is: push {r3, r4, r5, lr} ldr r4, .L5 ldr r5, [r4] adds r4, r2, #1 tst r5, #4 it ne movne r2, r4 bl bar movs r0, #0 pop {r3, r4, r5, pc} gcc/testsuite/ChangeLog: * gcc.target/arm/epilog-1.c: Corrected armv8.1.m-main asm. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

When generating thumb2 code, LDM SP!, {PC} is a two-byte instruction, whereas LDR PC, [SP], #4 is needs 4 bytes. When optimizing for size, or when there's no obvious performance benefit prefer the former. gcc/ChangeLog: PR target/118089 * config/arm/arm.cc (thumb2_expand_return): Use LDM SP!, {PC} when optimizing for size, or when there's no performance benefit over LDR PC, [SP], #4. (arm_expand_epilogue): Likewise.

My earlier change for making the compiler prefer POP {PC} over LDR PC, [SP], #4 had a slightly unexpected consequence in that we now also call arm_emit_multi_reg_pop to handle single register pops when the register is not PC. This exposed a latent bug in this function where the dwarf unwinding notes on the single-register POP were not being set correctly. gcc/ PR target/118089 * config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust note to single-register POP instructions.

…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case. (cherry picked from commit bb8dd09)

When an exception is thrown and caught, destruction of the exception checks whether the exception was allocated in the `emergency_pool`, which is a global variable. This global variable has a runtime constructor, which means access to it is valid only once the constructor has run during the module init phase. But throwing and catching an exception is permitted at any time, not just during the lifetime of `main`. And this must be true whether libsupc++ is linked dynamically or statically. LLVM Address Sanitizer aborts with `initialization-order-fiasco` when, in a binary which links libsupc++ statically, an exception is thrown and caught in some global constructor which happens to run prior to the global constructor of `emergency_pool`. ``` ERROR: AddressSanitizer: initialization-order-fiasco ... READ of size 8 at ... thread T0 SCARINESS: 14 (8-byte-read-initialization-order-fiasco) #0 ... in (anonymous namespace)::pool::in_pool(void*) gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc:258 gcc-mirror#1 ... in __cxa_free_exception gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc:302 gcc-mirror#2 ... in __gxx_exception_cleanup(_Unwind_Reason_Code, _Unwind_Exception*) gcc-11.x/libstdc++-v3/libsupc++/eh_throw.cc:51 gcc-mirror#3 ... in __cxa_end_catch gcc-11.x/libstdc++-v3/libsupc++/eh_catch.cc:125 ... ... in __cxx_global_var_init ... ... ... in call_init.part.0 glibc-2.40/elf/dl-init.c:74:3 ... in call_init glibc-2.40/elf/dl-init.c:120:14 ... in _dl_init glibc-2.40/elf/dl-init.c:121:5 ... in _dl_start_user glibc-2.40/elf/../sysdeps/aarch64/dl-start.S:46 ... is located 56 bytes inside of global variable '(anonymous namespace)::emergency_pool' defined in 'gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc' (...) of size 72 registered at: #0 ... in __asan_register_globals.part.0 llvm-project/compiler-rt/lib/asan/asan_globals.cpp:393:3 gcc-mirror#1 ... in __asan_register_globals llvm-project/compiler-rt/lib/asan/asan_globals.cpp:392:3 gcc-mirror#2 ... in __asan_register_elf_globals llvm-project/compiler-rt/lib/asan/asan_globals.cpp:376:26 gcc-mirror#3 ... in call_init.part.0 glibc-2.40/elf/dl-init.c:74:3 gcc-mirror#4 ... in call_init glibc-2.40/elf/dl-init.c:120:14 gcc-mirror#5 ... in _dl_init glibc-2.40/elf/dl-init.c:121:5 gcc-mirror#6 ... in _dl_start_user glibc-2.40/elf/../sysdeps/aarch64/dl-start.S:46 ```

The vadcq and vsbcq patterns had two problems: - the adc / sbc part of the pattern did not mention the use of vfpcc - the carry calcultation part should use a different unspec code In addtion, the get_fpscr_nzcvqc and set_fpscr_nzcvqc were over-cautious by using unspec_volatile when unspec is really what they need. Making them unspec enables to remove redundant accesses to FPSCR_nzcvqc. With unspec_volatile, we used to generate: test_2: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 vmov.i32 q0, #0x1 @ v4si push {lr} sub sp, sp, #12 vmrs r3, FPSCR_nzcvqc ;; [1] bic r3, r3, #536870912 vmsr FPSCR_nzcvqc, r3 vadc.i32 q3, q0, q0 vmrs r3, FPSCR_nzcvqc ;; [2] vmrs r3, FPSCR_nzcvqc orr r3, r3, #536870912 vmsr FPSCR_nzcvqc, r3 vadc.i32 q0, q0, q0 vmrs r3, FPSCR_nzcvqc ldr r0, .L8 ubfx r3, r3, #29, #1 str r3, [sp, #4] bl print_uint32x4_t add sp, sp, #12 @ sp needed pop {pc} .L9: .align 2 .L8: .word .LC1 with unspec, we generate: test_2: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 vmrs r3, FPSCR_nzcvqc ;; [1] bic r3, r3, #536870912 ;; [3] vmov.i32 q0, #0x1 @ v4si vmsr FPSCR_nzcvqc, r3 vadc.i32 q3, q0, q0 vmrs r3, FPSCR_nzcvqc orr r3, r3, #536870912 vmsr FPSCR_nzcvqc, r3 vadc.i32 q0, q0, q0 vmrs r3, FPSCR_nzcvqc push {lr} ubfx r3, r3, #29, #1 sub sp, sp, #12 ldr r0, .L8 str r3, [sp, #4] bl print_uint32x4_t add sp, sp, #12 @ sp needed pop {pc} .L9: .align 2 .L8: .word .LC1 That is, unspec in get_fpscr_nzcvqc enables to: - move [1] earlier - delete redundant [2] and unspec in set_fpscr_nzcvqc enables to move push {lr} and stack manipulation later. gcc/ChangeLog: PR target/122189 * config/arm/iterators.md (VxCIQ_carry, VxCIQ_M_carry, VxCQ_carry) (VxCQ_M_carry): New iterators. * config/arm/mve.md (get_fpscr_nzcvqc, set_fpscr_nzcvqc): Use unspec instead of unspec_volatile. (vadciq, vadciq_m, vadcq, vadcq_m): Use vfpcc in operation. Use a different unspec code for carry calcultation. * config/arm/unspecs.md (VADCQ_U_carry, VADCQ_M_U_carry) (VADCQ_S_carry, VADCQ_M_S_carry, VSBCIQ_U_carry ,VSBCIQ_S_carry ,VSBCIQ_M_U_carry ,VSBCIQ_M_S_carry ,VSBCQ_U_carry ,VSBCQ_S_carry ,VSBCQ_M_U_carry ,VSBCQ_M_S_carry ,VADCIQ_U_carry ,VADCIQ_M_U_carry ,VADCIQ_S_carry ,VADCIQ_M_S_carry): New unspec codes. gcc/testsuite/ChangeLog: PR target/122189 * gcc.target/arm/mve/intrinsics/vadcq-check-carry.c: New test. * gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Adjust instructions order. * gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Likewise.

The vectorizer has learned how to do boolean reductions of masks to a C bool for the operations OR, XOR and AND. This implements the new optabs for Adv.SIMD. Adv.SIMD today can already vectorize such loops but does so through SHIFT-AND-INSERT to perform the reductions step-wise and inorder. As an example, an OR reduction today does: movi v3.4s, 0 ext v5.16b, v30.16b, v3.16b, #8 orr v5.16b, v5.16b, v30.16b ext v29.16b, v5.16b, v3.16b, #4 orr v29.16b, v29.16b, v5.16b ext v4.16b, v29.16b, v3.16b, #2 orr v4.16b, v4.16b, v29.16b ext v3.16b, v4.16b, v3.16b, #1 orr v3.16b, v3.16b, v4.16b fmov w1, s3 and w1, w1, 1 For reducing to a boolean however we don't need the stepwise reduction and can just look at the bit patterns. For e.g. OR we now generate: umaxp v3.4s, v3.4s, v3.4s fmov x1, d3 cmp x1, 0 cset w0, ne For the remaining codegen see test vect-reduc-bool-9.c. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (reduc_sbool_and_scal_<mode>, reduc_sbool_ior_scal_<mode>, reduc_sbool_xor_scal_<mode>): New. * config/aarch64/iterators.md (VALLI): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-reduc-bool-1.c: New test. * gcc.target/aarch64/vect-reduc-bool-2.c: New test. * gcc.target/aarch64/vect-reduc-bool-3.c: New test. * gcc.target/aarch64/vect-reduc-bool-4.c: New test. * gcc.target/aarch64/vect-reduc-bool-5.c: New test. * gcc.target/aarch64/vect-reduc-bool-6.c: New test. * gcc.target/aarch64/vect-reduc-bool-7.c: New test. * gcc.target/aarch64/vect-reduc-bool-8.c: New test. * gcc.target/aarch64/vect-reduc-bool-9.c: New test.

The vadcq and vsbcq patterns had two problems: - the adc / sbc part of the pattern did not mention the use of vfpcc - the carry calcultation part should use a different unspec code In addtion, the get_fpscr_nzcvqc and set_fpscr_nzcvqc were over-cautious by using unspec_volatile when unspec is really what they need. Making them unspec enables to remove redundant accesses to FPSCR_nzcvqc. With unspec_volatile, we used to generate: test_2: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 vmov.i32 q0, #0x1 @ v4si push {lr} sub sp, sp, #12 vmrs r3, FPSCR_nzcvqc ;; [1] bic r3, r3, #536870912 vmsr FPSCR_nzcvqc, r3 vadc.i32 q3, q0, q0 vmrs r3, FPSCR_nzcvqc ;; [2] vmrs r3, FPSCR_nzcvqc orr r3, r3, #536870912 vmsr FPSCR_nzcvqc, r3 vadc.i32 q0, q0, q0 vmrs r3, FPSCR_nzcvqc ldr r0, .L8 ubfx r3, r3, #29, #1 str r3, [sp, #4] bl print_uint32x4_t add sp, sp, #12 @ sp needed pop {pc} .L9: .align 2 .L8: .word .LC1 with unspec, we generate: test_2: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 vmrs r3, FPSCR_nzcvqc ;; [1] bic r3, r3, #536870912 ;; [3] vmov.i32 q0, #0x1 @ v4si vmsr FPSCR_nzcvqc, r3 vadc.i32 q3, q0, q0 vmrs r3, FPSCR_nzcvqc orr r3, r3, #536870912 vmsr FPSCR_nzcvqc, r3 vadc.i32 q0, q0, q0 vmrs r3, FPSCR_nzcvqc push {lr} ubfx r3, r3, #29, #1 sub sp, sp, #12 ldr r0, .L8 str r3, [sp, #4] bl print_uint32x4_t add sp, sp, #12 @ sp needed pop {pc} .L9: .align 2 .L8: .word .LC1 That is, unspec in get_fpscr_nzcvqc enables to: - move [1] earlier - delete redundant [2] and unspec in set_fpscr_nzcvqc enables to move push {lr} and stack manipulation later. gcc/ChangeLog: PR target/122189 * config/arm/iterators.md (VxCIQ_carry, VxCIQ_M_carry, VxCQ_carry) (VxCQ_M_carry): New iterators. * config/arm/mve.md (get_fpscr_nzcvqc, set_fpscr_nzcvqc): Use unspec instead of unspec_volatile. (vadciq, vadciq_m, vadcq, vadcq_m): Use vfpcc in operation. Use a different unspec code for carry calcultation. * config/arm/unspecs.md (VADCQ_U_carry, VADCQ_M_U_carry) (VADCQ_S_carry, VADCQ_M_S_carry, VSBCIQ_U_carry ,VSBCIQ_S_carry ,VSBCIQ_M_U_carry ,VSBCIQ_M_S_carry ,VSBCQ_U_carry ,VSBCQ_S_carry ,VSBCQ_M_U_carry ,VSBCQ_M_S_carry ,VADCIQ_U_carry ,VADCIQ_M_U_carry ,VADCIQ_S_carry ,VADCIQ_M_S_carry): New unspec codes. gcc/testsuite/ChangeLog: PR target/122189 * gcc.target/arm/mve/intrinsics/vadcq-check-carry.c: New test. * gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Adjust instructions order. * gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Likewise. (cherry picked from commits 0272058 and 697ccad)

jakub and others added 30 commits March 20, 2013 09:01

Branch for OpenMP 4.0 support development.

1ccc44f

See http://openmp.org/wp/2013/03/openmp-40-rc2/ for the standard draft. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196809 138bc75d-0d04-0410-961f-82ee72b054a4

2013-03-20 Tobias Burnus <burnus@net-b.de>

cf12b7f

* env.c (handle_omp_display_env): New function. (initialize_env): Use it. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196817 138bc75d-0d04-0410-961f-82ee72b054a4

* semantics.c (finish_omp_for): Disallow class iterators for

5c84774

OMP_SIMD and OMP_FOR_SIMD loops. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197515 138bc75d-0d04-0410-961f-82ee72b054a4

Rewrite Cilk Plus <#pragma simd> parsing and rewrite to use gomp4's

b2270ce

OMP_SIMD infrastructure.

Merge remote-tracking branch 'origin/gomp-4_0-branch' into cilk-in-gomp

64bb070

Conflicts: gcc/omp-low.c

Verify the integrity of the _Cilk_for body.

f80ffc7

Fix typo in last commit.

061de24

Disallow a condition of != in _Cilk_for.

1176fa6

Generate an OMP safelen clause when a Cilk Plus vectorlength clause is

c311b69

present.

Implement the parsing bits for the vectorlengthfor clause.

d56c70f

Remove vectorlengthfor clause which has been deprecated.

50ee630

Implement c_finish_cilk_clauses to verify <#pragma simd> clauses.

1cfb1fb

* c-pragma.c (omp_pragmas): Add PRAGMA_OMP_DISTRIBUTE.

740af71

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198460 138bc75d-0d04-0410-961f-82ee72b054a4

Merge remote-tracking branch 'origin/gomp-4_0-branch' into cilk-in-gomp

20b6dfe

Allow "!=" in for Cilk for conditionals.

5ac236d

Remove deprecated vectorlength clause features. Remove deprecated assert and noassert clauses. Implement vectorlength clause in OpenMP safelen terms.

Fixed a uninit. variable error in c-typeck.c

ec8f0b4

svn merge -r196807:198832 svn+ssh://gcc.gnu.org/svn/gcc/trunk

85fe34d

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198835 138bc75d-0d04-0410-961f-82ee72b054a4

NinaRanns referenced this pull request in NinaRanns/gcc May 30, 2024

Merge pull request #4 from NinaRanns/contracts-nonattr

4c447ed

fixing tests and removing C++20 requirement

mikpe added a commit to mikpe/gcc that referenced this pull request Sep 8, 2024

CDP1802 libgcc support: __udivmodsi4 bug fix gcc-mirror#4

3ac163d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aldyh/cilk in gomp #4

Aldyh/cilk in gomp #4

Uh oh!

sushantchry commented Jan 14, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Aldyh/cilk in gomp #4

Are you sure you want to change the base?

Aldyh/cilk in gomp #4

Uh oh!

Conversation

sushantchry commented Jan 14, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants