mirrored from git://gcc.gnu.org/git/gcc.git
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Aldyh/cilk in gomp #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sushantchry
wants to merge
79
commits into
master
Choose a base branch
from
aldyh/cilk-in-gomp
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
See http://openmp.org/wp/2013/03/openmp-40-rc2/ for the standard draft. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196809 138bc75d-0d04-0410-961f-82ee72b054a4
Add another argument to c_finish_omp_atomic. * parser.c (cp_parser_binary_expression): Handle no_toplevel_fold_p even for binary operations other than comparison. (cp_parser_omp_atomic): Handle parsing OpenMP 4.0 atomics. * pt.c (tsubst_expr) <case OMP_ATOMIC>: Handle atomic exchange. * semantics.c (finish_omp_atomic): Use cp_tree_equal to diagnose expression mismatches and to find out if c_finish_omp_atomic should be called with swapped set to true or false. * c-omp.c (c_finish_omp_atomic): Add swapped argument, if true, build the operation first with rhs, lhs arguments and use NOP_EXPR build_modify_expr. * c-common.h (c_finish_omp_atomic): Adjust prototype. * c-c++-common/gomp/atomic-15.c: Remove error test that is now valid in OpenMP 4.0. * testsuite/libgomp.c++/atomic-10.C: New test. * testsuite/libgomp.c++/atomic-11.C: New test. * testsuite/libgomp.c++/atomic-12.C: New test. * testsuite/libgomp.c++/atomic-13.C: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196815 138bc75d-0d04-0410-961f-82ee72b054a4
with default value, pass it down to c_parser_conditional_expression. (c_parser_conditional_expression): Add omp_atomic_lhs argument, pass it down to c_parser_binary_expression. Don't pass PREC_NONE to it. Adjust recursive call. (c_parser_binary_expression): Remove prec argument, add omp_atomic_lhs argument. Always start from PREC_NONE, if omp_atomic_lhs is non-NULL and one of the arguments of toplevel binop matches it, use build2 instead of parser_build_binary_op. (c_parser_omp_atomic): Handle OpenMP 4.0 atomics. (c_parser_omp_for_loop): Adjust c_parser_binary_expression caller. * c-tree.h (c_tree_equal): New prototype. * c-typeck.c (c_tree_equal): New function. * parser.c (cp_parser_omp_atomic): Never restart unless structured_block is true. * c-c++-common/gomp/atomic-15.c: Adjust for C diagnostics. * testsuite/libgomp.c/atomic-14.c: Add parens to make it valid. * testsuite/libgomp.c/atomic-15.c: New test. * testsuite/libgomp.c/atomic-16.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196816 138bc75d-0d04-0410-961f-82ee72b054a4
* env.c (handle_omp_display_env): New function.
(initialize_env): Use it.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196817 138bc75d-0d04-0410-961f-82ee72b054a4
* libgomp.texi (Environment Variables): Minor cleanup,
update section refs to OpenMP 4.0rc2.
(OMP_DISPLAY_ENV, GOMP_SPINCOUNT): Document these
environment variables.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196818 138bc75d-0d04-0410-961f-82ee72b054a4
GIMPLE_OMP_FOR kinds. * tree.def (OMP_SIMD, OMP_FOR_SIMD, OMP_DISTRIBUTE): New tree codes. * gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_MASK, GF_OMP_FOR_KIND_FOR, GF_OMP_FOR_KIND_SIMD, GF_OMP_FOR_KIND_FOR_SIMD and GF_OMP_FOR_KIND_DISTRIBUTE. (gimple_omp_for_kind, gimple_omp_for_set_kind): New inline functions. * gimplify.c (is_gimple_stmt, gimplify_omp_for, gimplify_expr): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. * tree.c (omp_clause_num_ops, omp_clause_code_name, walk_tree_1): Handle new OpenMP 4.0 clauses. * tree-pretty-print.c (dump_omp_clause): Likewise. (dump_generic_node): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. * tree.h (enum omp_clause_code): Add OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED, OMP_CLAUSE_DEPEND, OMP_CLAUSE_FROM, OMP_CLAUSE_TO, OMP_CLAUSE_UNIFORM, OMP_CLAUSE_MAP, OMP_CLAUSE_DEVICE, OMP_CLAUSE_DIST_SCHEDULE, OMP_CLAUSE_INBRANCH, OMP_CLAUSE_NOTINBRANCH, OMP_CLAUSE_NUM_TEAMS, OMP_CLAUSE_PROC_BIND, OMP_CLAUSE_SAFELEN, OMP_CLAUSE_SIMDLEN, OMP_CLAUSE_FOR, OMP_CLAUSE_PARALLEL, OMP_CLAUSE_SECTIONS and OMP_CLAUSE_TASKGROUP. (OMP_LOOP_CHECK): Define. (OMP_FOR_BODY, OMP_FOR_CLAUSES, OMP_FOR_INIT, OMP_FOR_COND, OMP_FOR_INCR, OMP_FOR_PRE_BODY): Use OMP_LOOP_CHECK instead of OMP_FOR_CHECK. (OMP_CLAUSE_DECL): Extend check range up to OMP_CLAUSE_MAP. (OMP_CLAUSE_LINEAR_STEP, OMP_CLAUSE_ALIGNED_ALIGNMENT, OMP_CLAUSE_NUM_TEAMS_EXPR, OMP_CLAUSE_DEVICE_ID, OMP_CLAUSE_DIST_SCHEDULE_CHUNK_EXPR, OMP_CLAUSE_SAFELEN_EXPR, OMP_CLAUSE_SIMDLEN_EXPR): Define. (enum omp_clause_depend_kind, enum omp_clause_map_kind, enum omp_clause_proc_bind_kind): New enums. (OMP_CLAUSE_DEPEND_KIND, OMP_CLAUSE_MAP_KIND, OMP_CLAUSE_PROC_BIND_KIND): Define. (struct tree_omp_clause): Add subcode.depend_kind, subcode.map_kind and subcode.proc_bind_kind. (find_omp_clause): New prototype. * omp-builtins.def (BUILT_IN_GOMP_CANCEL, BUILT_IN_GOMP_CANCELLATION_POINT): New built-ins. * tree-flow.h (find_omp_clause): Remove prototype. c/ * c-parser.c (c_parser_omp_all_clauses): Change mask argument type from unsigned to omp_clause_mask. (c_parser_omp_for_loop): Adjust c_finish_omp_for caller. (OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK, OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK, OMP_TASK_CLAUSE_MASK): Use OMP_CLAUSE_MASK_1 instead of 1. (c_parser_omp_parallel): Use omp_clause_mask type instead of unsigned for mask, use OMP_CLAUSE_MASK_1 instead of 1 for masks. cp/ * cp-tree.h (OMP_FOR_GIMPLIFYING_P): Use OMP_LOOP_CHECK instead of OMP_FOR_CHECK. (finish_omp_for): Add enum tree_code second argument. (finish_omp_cancel, finish_omp_cancellation_point): New prototypes. * cp-gimplify.c (cp_gimplify_expr, cp_genericize_r): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. * semantics.c (finish_omp_clauses): Handle new OpenMP 4.0 clauses. (finish_omp_for): Add code argument, pass it down to make_node or c_finish_omp_for. (finish_omp_cancel, finish_omp_cancellation_point): New functions. * parser.c (cp_parser_omp_clause_name): Add parsing of new OpenMP 4.0 clauses. (cp_parser_omp_var_list_no_open): Add COLON argument, if non-NULL, accept termination by colon instead of closing paren. (cp_parser_omp_var_list, cp_parser_omp_clause_reduction): Adjust callers. (cp_parser_omp_clause_branch, cp_parser_omp_clause_cancelkind, cp_parser_omp_clause_num_teams, cp_parser_omp_clause_aligned, cp_parser_omp_clause_linear, cp_parser_omp_clause_depend, cp_parser_omp_clause_map, cp_parser_omp_clause_device, cp_parser_omp_clause_dist_schedule, cp_parser_omp_clause_proc_bind): New functions. (cp_parser_omp_all_clauses): Change mask argument's type to omp_clause_mask from unsigned. Fix c_name for PRAGMA_OMP_CLAUSE_UNTIED. Handle new OpenMP 4.0 clauses. (cp_parser_omp_for_loop): Add code argument. Pass it down to finish_omp_for. (OMP_SIMD_CLAUSE_MASK): Define. (cp_parser_omp_simd): New function. (OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK, OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK, OMP_TASK_CLAUSE_MASK): Use OMP_CLAUSE_MASK_1 instead of 1. (cp_parser_omp_for): Handle parsing of #pragma omp for simd. (cp_parser_omp_parallel): Handle parsing of #pragma omp parallel for simd. Use omp_clause_mask type instead of unsigned for mask, use OMP_CLAUSE_MASK_1 instead of 1 for masks. (OMP_CANCEL_CLAUSE_MASK, OMP_CANCELLATION_POINT_CLAUSE_MASK): Define. (cp_parser_omp_cancel, cp_parser_omp_cancellation_point): New functions. (cp_parser_omp_construct): Handle PRAGMA_OMP_SIMD, PRAGMA_OMP_CANCEL and PRAGMA_OMP_CANCELLATION_POINT. (cp_parser_pragma): Handle PRAGMA_OMP_SIMD. * pt.c (tsubst_expr): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. Pass down TREE_CODE to finish_omp_for. fortran/ * f95-lang.c (ATTR_NULL): Define. c-family/ * c-omp.c (c_finish_omp_for): Add code argument, pass it down to make_code. (c_split_parallel_clauses): Handle OMP_CLAUSE_SAFELEN, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_LINEAR. * c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_CANCEL, PRAGMA_OMP_CANCELLATION_POINT, PRAGMA_OMP_DECLARE_REDUCTION, PRAGMA_OMP_DECLARE_SIMD, PRAGMA_OMP_DECLARE_TARGET, PRAGMA_OMP_DISTRIBUTE, PRAGMA_OMP_END_DECLARE_TARGET, PRAGMA_OMP_FOR_SIMD, PRAGMA_OMP_PARALLEL_FOR_SIMD, PRAGMA_OMP_SIMD, PRAGMA_OMP_TARGET, PRAGMA_OMP_TARGET_DATA, PRAGMA_OMP_TARGET_UPDATE, PRAGMA_OMP_TASKGROUP and PRAGMA_OMP_TEAMS. (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_ALIGNED, PRAGMA_OMP_CLAUSE_DEPEND, PRAGMA_OMP_CLAUSE_DEVICE, PRAGMA_OMP_CLAUSE_DIST_SCHEDULE, PRAGMA_OMP_CLAUSE_FOR, PRAGMA_OMP_CLAUSE_FROM, PRAGMA_OMP_CLAUSE_INBRANCH, PRAGMA_OMP_CLAUSE_LINEAR, PRAGMA_OMP_CLAUSE_MAP, PRAGMA_OMP_CLAUSE_NOTINBRANCH, PRAGMA_OMP_CLAUSE_NUM_TEAMS, PRAGMA_OMP_CLAUSE_PARALLEL, PRAGMA_OMP_CLAUSE_PROC_BIND, PRAGMA_OMP_CLAUSE_SAFELEN, PRAGMA_OMP_CLAUSE_SECTIONS, PRAGMA_OMP_CLAUSE_SIMDLEN, PRAGMA_OMP_CLAUSE_TASKGROUP, PRAGMA_OMP_CLAUSE_TO and PRAGMA_OMP_CLAUSE_UNIFORM. * c-pragma.c (omp_pragmas): Add new OpenMP 4.0 constructs. * c-common.h (c_finish_omp_for): Add enum tree_code as second argument. (OMP_CLAUSE_MASK_1): Define. (omp_clause_mask): For HWI >= 64 new typedef for unsigned HOST_WIDE_INT, otherwise a class with needed ctors and operators. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197161 138bc75d-0d04-0410-961f-82ee72b054a4
OMP_SIMD and OMP_FOR_SIMD loops. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197515 138bc75d-0d04-0410-961f-82ee72b054a4
omp_get_proc_bind, omp_get_proc_bind_, omp_set_default_device, omp_set_default_device_, omp_set_default_device_8_, omp_get_default_device, omp_get_default_device_, omp_get_num_devices, omp_get_num_devices_, omp_get_num_teams, omp_get_num_teams_, omp_get_team_num, omp_get_team_num_): Export @@OMP_4.0. (GOMP_cancel, GOMP_cancellation_point, GOMP_parallel_loop_dynamic, GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime, GOMP_parallel_loop_static, GOMP_parallel_sections, GOMP_parallel, GOMP_taskgroup_start, GOMP_taskgroup_end): Export @@GOMP_4.0. * parallel.c (GOMP_parallel_end): Add ialias. (GOMP_parallel, GOMP_cancel, GOMP_cancellation_point): New functions. * omp.h.in (omp_proc_bind_t): New typedef. (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New prototypes. * env.c (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New functions. * fortran.c (ULP, STR1, STR2, ialias_redirect): Removed. (omp_get_cancellation_, omp_get_proc_bind_, omp_set_default_device_, omp_set_default_device_8_, omp_get_default_device_, omp_get_num_devices_, omp_get_num_teams_, omp_get_team_num_): New functions. * libgomp.h (ialias_ulp, ialias_str1, ialias_str2, ialias_redirect, ialias_call): Define. * libgomp_g.h (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic, GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime, GOMP_parallel, GOMP_cancel, GOMP_cancellation_point, GOMP_taskgroup_start, GOMP_taskgroup_end, GOMP_parallel_sections): New prototypes. * task.c (GOMP_taskgroup_start, GOMP_taskgroup_end): New functions. * sections.c (GOMP_parallel_sections): New function. * loop.c (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic, GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime): New functions. (GOMP_parallel_end): Add ialias_redirect. * omp_lib.f90.in (omp_proc_bind_kind, omp_proc_bind_false, omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close, omp_proc_bind_spread): New params. (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New interfaces. * omp_lib.h.in (omp_proc_bind_kind, omp_proc_bind_false, omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close, omp_proc_bind_spread): New params. (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New externals. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197670 138bc75d-0d04-0410-961f-82ee72b054a4
(BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove. (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New. * gimplify.c (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_PROC_BIND. * omp-builtins.def (BUILT_IN_GOMP_TASKGROUP_START, BUILT_IN_GOMP_TASKGROUP_END, BUILT_IN_GOMP_PARALLEL_LOOP_STATIC, BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC, BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED, BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME, BUILT_IN_GOMP_PARALLEL, BUILT_IN_GOMP_PARALLEL_SECTIONS): New built-ins. (BUILT_IN_GOMP_PARALLEL_LOOP_STATIC_START, BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC_START, BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED_START, BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME_START, BUILT_IN_GOMP_PARALLEL_START, BUILT_IN_GOMP_PARALLEL_END, BUILT_IN_GOMP_PARALLEL_SECTIONS_START): Remove. * omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_PROC_BIND. (expand_parallel_call): Expand #pragma omp parallel* as calls to the new GOMP_parallel_* APIs without _start at the end, instead of GOMP_parallel_*_start followed by fn.omp_fn.N call, followed by GOMP_parallel_end. Handle OMP_CLAUSE_PROC_BIND. * tree-ssa-alias.c (ref_maybe_used_by_call_p_1, call_may_clobber_ref_p_1): Handle BUILT_IN_GOMP_TASKGROUP_END instead of BUILT_IN_GOMP_PARALLEL_END. c-family/ * c-common.c (DEF_FUNCTION_TYPE_8): Define. * c-omp.c (c_split_parallel_clauses): Handle OMP_CLAUSE_PROC_BIND. cp/ * cp-tree.h (finish_omp_taskgroup): New prototype. * parser.c (cp_parser_omp_clause_proc_bind): Require ) instead of colon at the end of the clause. (cp_parser_omp_taskgroup): New function. (cp_parser_omp_construct, cp_parser_pragma): Handle PRAGMA_OMP_TASKGROUP. * semantics.c (finish_omp_taskgroup): New function. fortran/ * f95-lang.c (DEF_FUNCTION_TYPE_8): Define. * types.def (DEF_FUNCTION_TYPE_8): Document. (BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove. (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New. ada/ * gcc-interface/utils.c (DEF_FUNCTION_TYPE_8): Define. lto/ * lto-lang.c (DEF_FUNCTION_TYPE_8): Define. testsuite/ * gcc.dg/gomp/combined-1.c: Look for GOMP_parallel_loop_runtime instead of GOMP_parallel_loop_runtime_start. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197676 138bc75d-0d04-0410-961f-82ee72b054a4
OMP_CLAUSE_LINEAR_NO_COPYOUT): Define. * omp-low.c (extract_omp_for_data): Handle #pragma omp simd. (build_outer_var_ref): For #pragma omp simd allow linear etc. clauses to bind even to private vars. (scan_sharing_clauses): Handle OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_SAFELEN. (lower_rec_input_clauses): Handle OMP_CLAUSE_LINEAR. Don't emit a GOMP_barrier call for firstprivate/lastprivate in #pragma omp simd. (lower_lastprivate_clauses): Handle also OMP_CLAUSE_LINEAR. (expand_omp_simd): New function. (expand_omp_for): Handle #pragma omp simd. * gimplify.c (enum gimplify_omp_var_data): Add GOVD_LINEAR and GOVD_ALIGNED, add GOVD_LINEAR into GOVD_DATA_SHARE_CLASS. (enum omp_region_type): Add ORT_SIMD. (gimple_add_tmp_var, gimplify_var_or_parm_decl, omp_check_private, omp_firstprivatize_variable, omp_notice_variable): Handle ORT_SIMD like ORT_WORKSHARE. (omp_is_private): Likewise. Add SIMD argument, tweak diagnostics and add extra errors in simd constructs. (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_SAFELEN. (gimplify_adjust_omp_clauses_1): Handle GOVD_LASTPRIVATE and GOVD_ALIGNED. (gimplify_omp_for): Handle #pragma omp simd. cp/ * cp-tree.h (CP_OMP_CLAUSE_INFO): Also allow it on OMP_CLAUSE_LINEAR. * parser.c (cp_parser_omp_var_list_no_open): If colon is non-NULL, temporarily disable colon_corrects_to_scope_p during the parsing of the variable list. (cp_parser_omp_clause_safelen, cp_parser_omp_clause_simdlen): New functions. (cp_parser_omp_all_clauses): Handle OMP_CLAUSE_SAFELEN and OMP_CLAUSE_SIMDLEN. * semantics.c (finish_omp_clauses): Allow NULL_TREE in OMP_CLAUSE_ALIGNED_ALIGNMENT. testsuite/ * c-c++-common/gomp/simd1.c: New test. * c-c++-common/gomp/simd2.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198092 138bc75d-0d04-0410-961f-82ee72b054a4
OMP_SIMD infrastructure.
* gimplify.c (gimplify_adjust_omp_clauses): For linear clauses if outer_context is non-NULL, but not ORT_COMBINED_PARALLEL, call omp_notice_variable. Remove aligned clauses that can't be handled yet. * omp-low.c: Include target.h. (scan_sharing_clauses): For aligned clauses with global arrays register local replacement. (omp_clause_aligned_alignment): New function. (lower_rec_input_clauses): For aligned clauses for global arrays or automatic pointers emit __builtin_assume_aligned before the loop if possible. (expand_omp_regimplify_p, expand_omp_build_assign): New functions. (expand_omp_simd): Use them. Handle pointer iterators and broken loops. (lower_omp_for): Call lower_omp on gimple_omp_body_ptr after calling lower_rec_input_clauses, not before it. cp/ * semantics.c (finish_omp_clauses): On OMP_CLAUSE_LINEAR clauses verify OMP_CLAUSE_DECL has integral or pointer type, and handle linear steps for pointer type decls. FIx up handling of OMP_CLAUSE_UNIFORM. testsuite/ * c-c++-common/gomp/simd3.c: New test. * c-c++-common/gomp/simd4.c: New test. * c-c++-common/gomp/simd5.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198193 138bc75d-0d04-0410-961f-82ee72b054a4
Conflicts: gcc/omp-low.c
* c-parser.c (c_parser_compound_statement, c_parser_statement): Adjust comments for OpenMP 3.0+ additions. (c_parser_pragma): Handle PRAGMA_OMP_CANCEL and PRAGMA_OMP_CANCELLATION_POINT. (c_parser_omp_clause_name): Handle new OpenMP 4.0 clauses. (c_parser_omp_clause_collapse): Fully fold collapse expression. (c_parser_omp_clause_branch, c_parser_omp_clause_cancelkind, c_parser_omp_clause_num_teams, c_parser_omp_clause_aligned, c_parser_omp_clause_linear, c_parser_omp_clause_safelen, c_parser_omp_clause_simdlen, c_parser_omp_clause_depend, c_parser_omp_clause_map, c_parser_omp_clause_device, c_parser_omp_clause_dist_schedule, c_parser_omp_clause_proc_bind, c_parser_omp_clause_to, c_parser_omp_clause_from, c_parser_omp_clause_uniform): New functions. (c_parser_omp_all_clauses): Handle new OpenMP 4.0 clauses. (c_parser_omp_for_loop): Add CODE argument, pass it through to c_finish_omp_for. (OMP_SIMD_CLAUSE_MASK): Define. (c_parser_omp_simd): New function. (c_parser_omp_for): Parse #pragma omp for simd. (OMP_PARALLEL_CLAUSE_MASK): Add OMP_CLAUSE_PROC_BIND. (c_parser_omp_parallel): Parse #pragma omp parallel for simd. (OMP_TASK_CLAUSE_MASK): Add OMP_CLAUSE_DEPEND. (c_parser_omp_taskgroup): New function. (OMP_CANCEL_CLAUSE_MASK, OMP_CANCELLATION_POINT_CLAUSE_MASK): Define. (c_parser_omp_cancel, c_parser_omp_cancellation_point): New functions. (c_parser_omp_construct): Handle PRAGMA_OMP_SIMD and PRAGMA_OMP_TASKGROUP. (c_parser_transaction_cancel): Formatting fix. * c-tree.h (c_begin_omp_taskgroup, c_finish_omp_taskgroup, c_finish_omp_cancel, c_finish_omp_cancellation_point): New prototypes. * c-typeck.c (c_begin_omp_taskgroup, c_finish_omp_taskgroup, c_finish_omp_cancel, c_finish_omp_cancellation_point): New functions. (c_finish_omp_clauses): Handle new OpenMP 4.0 clauses. cp/ * parser.c (cp_parser_omp_clause_name): Add missing break after case 'i'. (cp_parser_omp_cancellation_point): Diagnose error if #pragma omp cancellation isn't followed by point. * semantics.c (finish_omp_clauses): Complain also about zero in alignment of aligned directive or safelen/simdlen expressions. (finish_omp_cancel): Fix up diagnostics wording. testsuite/ * c-c++-common/gomp/simd1.c: Enable also for C. * c-c++-common/gomp/simd2.c: Likewise. * c-c++-common/gomp/simd3.c: Likewise. * c-c++-common/gomp/simd4.c: Likewise. Adjust expected diagnostics for C. * c-c++-common/gomp/simd5.c: Enable also for C. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198264 138bc75d-0d04-0410-961f-82ee72b054a4
OpenMP constructs nested inside simd region. Don't treat
#pragma omp simd as work-sharing region. Disallow work-sharing
constructs inside of critical region. Complain if ordered
region is nested inside of parallel region without loop
region in between.
(scan_omp_1_stmt): Call check_omp_nesting_restrictions even
for GOMP_{cancel{,lation_point},taskyield,taskwait} calls.
* gfortran.dg/gomp/appendix-a/a.35.5.f90: Add dg-error.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198459 138bc75d-0d04-0410-961f-82ee72b054a4
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198460 138bc75d-0d04-0410-961f-82ee72b054a4
dump_gimple_omp_atomic_store): Handle gimple_omp_atomic_seq_cst_p. * gimple.h (enum gf_mask): Add GF_OMP_ATOMIC_SEQ_CST. (gimple_omp_atomic_set_seq_cst, gimple_omp_atomic_seq_cst_p): New inline functions. * omp-low.c (expand_omp_atomic_load, expand_omp_atomic_store, expand_omp_atomic_fetch_op): If gimple_omp_atomic_seq_cst_p, pass MEMMODEL_SEQ_CST instead of MEMMODEL_RELAXED to the builtin. * gimplify.c (gimplify_omp_atomic): Handle OMP_ATOMIC_SEQ_CST. * tree-pretty-print.c (dump_generic_node): Handle OMP_ATOMIC_SEQ_CST. * tree.def (OMP_ATOMIC): Add comment that OMP_ATOMIC* must stay consecutive. * tree.h (OMP_ATOMIC_SEQ_CST): Define. c/ * c-parser.c (c_parser_omp_atomic): Parse seq_cst clause, pass true if it is present to c_finish_omp_atomic. cp/ * pt.c (tsubst_expr): Pass OMP_ATOMIC_SEQ_CST to finish_omp_atomic. * semantics.c (finish_omp_atomic): Add seq_cst argument, pass it through to c_finish_omp_atomic or store into OMP_ATOMIC_SEQ_CST. * cp-tree.h (finish_omp_atomic): Adjust prototype. * parser.c (cp_parser_omp_atomic): Parse seq_cst clause, pass true if it is present to finish_omp_atomic. c-family/ * c-omp.c (c_finish_omp_atomic): Add seq_cst argument, store it into OMP_ATOMIC_SEQ_CST bit. * c-common.h (c_finish_omp_atomic): Adjust prototype. testsuite/ * testsuite/libgomp.c/atomic-17.c: New test. * testsuite/libgomp.c++/atomic-14.C: New test. * testsuite/libgomp.c++/atomic-15.C: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198461 138bc75d-0d04-0410-961f-82ee72b054a4
Remove deprecated vectorlength clause features. Remove deprecated assert and noassert clauses. Implement vectorlength clause in OpenMP safelen terms.
(attribute_value_equal): Call it for -fopenmp if TREE_VALUE of the attributes are both OMP_CLAUSEs. * tree.h (omp_declare_simd_clauses_equal): Declare. c-family/ * c-common.c (c_common_attribute_table): Add "omp declare simd" attribute. (handle_omp_declare_simd_attribute): New function. * c-common.h (c_omp_declare_simd_clauses_to_numbers, c_omp_declare_simd_clauses_to_decls): Declare. * c-omp.c (c_omp_declare_simd_clause_cmp, c_omp_declare_simd_clauses_to_numbers, c_omp_declare_simd_clauses_to_decls): New functions. cp/ * cp-tree.h (cp_decl_specifier_seq): Add omp_declare_simd_clauses field. (finish_omp_declare_simd): Declare. * decl2.c (is_late_template_attribute): Return true for "omp declare simd" attribute. (cp_check_const_attributes): Don't check TREE_VALUE of arg if arg isn't a TREE_LIST. * decl.c (grokfndecl): Add omp_declare_simd_clauses argument, call finish_omp_declare_simd if non-NULL. (grokdeclarator): Pass it declspecs->omp_declare_simd_clauses to grokfndecl. * pt.c (apply_late_template_attributes): Handle "omp declare simd" attribute specially. (tsubst_omp_clauses): Add declare_simd argument, don't call finish_omp_clauses if it is set. Handle OpenMP 4.0 clauses. (tsubst_expr): Adjust tsubst_omp_clauses callers. * semantics.c (finish_omp_clauses): Diagnose inbranch notinbranch. (finish_omp_declare_simd): New function. * parser.h (struct cp_parser): Add omp_declare_simd_clauses field. * parser.c (cp_ensure_no_omp_declare_simd, cp_finish_omp_declare_simd): New functions. (enum pragma_context): Add pragma_member and pragma_objc_icode. (cp_parser_linkage_specification, cp_parser_namespace_definition, cp_parser_class_specifier_1): Call cp_ensure_no_omp_declare_simd. (cp_parser_init_declarator, cp_parser_member_declaration, cp_parser_function_definition_from_specifiers_and_declarator, cp_parser_save_member_function_body): Copy parser->omp_declare_simd_clauses to decl_specifiers->omp_declare_simd_clauses, call cp_finish_omp_declare_simd. (cp_parser_member_specification_opt): Pass pragma_member instead of pragma_external to cp_parser_pragma. (cp_parser_objc_interstitial_code): Pass pragma_objc_icode instead of pragma_external to cp_parser_pragma. (cp_parser_omp_var_list_no_open): If parser->omp_declare_simd_clauses, just cp_parser_identifier the argument names. (cp_parser_omp_all_clauses): Don't call finish_omp_clauses for parser->omp_declare_simd_clauses. (OMP_DECLARE_SIMD_CLAUSE_MASK): Define. (cp_parser_omp_declare_simd, cp_parser_omp_declare): New functions. (cp_parser_pragma): Call cp_ensure_no_omp_declare_simd. Handle PRAGMA_OMP_DECLARE_REDUCTION. Replace == pragma_external with != pragma_stmt and != pragma_compound. testsuite/ * g++.dg/gomp/declare-simd-1.C: New test. * g++.dg/gomp/declare-simd-2.C: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198739 138bc75d-0d04-0410-961f-82ee72b054a4
* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_LINEAR_STEP adjustments for pointer-types here. Diagnose inbranch notinbranch being used together. (c_finish_omp_declare_simd): New function. * c-parser.c (enum pragma_context): Add pragma_struct and pragma_param. (c_parser_declaration_or_fndef): Add omp_declare_simd_clauses argument. Call c_finish_omp_declare_simd if needed. (c_parser_external_declaration, c_parser_compound_statement_nostart, c_parser_label, c_parser_for_statement, c_parser_objc_methodprotolist, c_parser_omp_for_loop): Adjust c_parser_declaration_or_fndef callers. (c_parser_struct_or_union_specifier): Use pragma_struct instead of pragma_external. (c_parser_parameter_declaration): Use pragma_param instead of pragma_external. (c_parser_pragma): Handle PRAGMA_OMP_DECLARE_REDUCTION. Replace == pragma_external with != pragma_stmt && != pragma_compound test. (c_parser_omp_variable_list): Add declare_simd argument. Don't lookup vars if it is true, just store identifiers. (c_parser_omp_var_list_parens, c_parser_omp_clause_depend, c_parser_omp_clause_map): Adjust callers. (c_parser_omp_clause_reduction, c_parser_omp_clause_aligned): Add declare_simd argument, pass it through to c_parser_omp_variable_list. (c_parser_omp_clause_linear): Likewise. Don't handle OMP_CLAUSE_LINEAR_STEP adjustements for pointer-types here. (c_parser_omp_clause_uniform): Call c_parser_omp_variable_list instead of c_parser_omp_var_list_parens to pass true as declare_simd. (c_parser_omp_all_clauses): Add declare_simd argument, pass it through clause parsing routines as needed. Don't call c_finish_omp_clauses if set. (c_parser_omp_simd, c_parser_omp_for, c_parser_omp_sections, c_parser_omp_parallel, c_parser_omp_single, c_parser_omp_task, c_parser_omp_cancel, c_parser_omp_cancellation_point): Adjust callers. (OMP_DECLARE_SIMD_CLAUSE_MASK): Define. (c_parser_omp_declare_simd, c_parser_omp_declare): New functions. * gcc.dg/gomp/declare-simd-1.c: New test. * gcc.dg/gomp/declare-simd-2.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198828 138bc75d-0d04-0410-961f-82ee72b054a4
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198835 138bc75d-0d04-0410-961f-82ee72b054a4
kraj
pushed a commit
to kraj/gcc
that referenced
this pull request
May 2, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type. Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop. It's unlikely to make a difference on any real code, but the
simplification is also nice.
We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.
#define NC(X) \
template <class U> struct X##1; \
template <class U> struct X#gcc-mirror#2; \
template <class U> struct X#gcc-mirror#3; \
template <class U> struct X#gcc-mirror#4; \
template <class U> struct X#gcc-mirror#5; \
template <class U> struct X#gcc-mirror#6;
#define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
#define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
template <int I> struct A
{
NC3(am)
};
template <class...Ts> void sink(Ts...);
template <int...Is> void g()
{
sink(A<Is>()...);
}
template <int I> void f()
{
g<__integer_pack(I)...>();
}
int main()
{
f<1000>();
}
gcc/cp/ChangeLog:
* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
of a class template.
(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
rurban
pushed a commit
to rurban/gcc
that referenced
this pull request
Oct 26, 2023
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.
For the new test case:
const int table[2] = {1, 2};
int foo (char i) { return table[i]; }
compiling with -O2 -mlarge on msp430 we currently see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
which results in the following code:
foo: AND #0xff, R12
RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
RLAM.A gcc-mirror#1, R12
MOVX.W table(R12), R12
RETA
With this patch, we now see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8
foo: MOV.B R12, R12
RLAM.A gcc-mirror#1, R12
MOVX.W table(R12), R12
RETA
2023-10-26 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.
gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.
XYenChi
referenced
this pull request
in XYenChi/gcc
Nov 7, 2023
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.
For the new test case:
const int table[2] = {1, 2};
int foo (char i) { return table[i]; }
compiling with -O2 -mlarge on msp430 we currently see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
which results in the following code:
foo: AND #0xff, R12
RLAM.A #4, R12 { RRAM.A #4, R12
RLAM.A #1, R12
MOVX.W table(R12), R12
RETA
With this patch, we now see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8
foo: MOV.B R12, R12
RLAM.A #1, R12
MOVX.W table(R12), R12
RETA
2023-10-26 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.
gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.
hubot
pushed a commit
that referenced
this pull request
Feb 16, 2024
Here we have
template<class T>
auto is_throwable(T t) -> decltype(throw t, true) { ... }
where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused
the wrong overload to have been chosen. Jason figured out it's because
we don't correctly implement [expr.prim.id.unqual]#4.2, which post-P2266
says that an id-expression is move-eligible if
"the id-expression (possibly parenthesized) is the operand of
a throw-expression, and names an implicitly movable entity that belongs
to a scope that does not contain the compound-statement of the innermost
lambda-expression, try-block, or function-try-block (if any) whose
compound-statement or ctor-initializer contains the throw-expression."
I worked out that it's trying to say that given
struct X {
X();
X(const X&);
X(X&&) = delete;
};
the following should fail: the scope of the throw is an sk_try, and it's
also x's scope S, and S "does not contain the compound-statement of the
*try-block" so x is move-eligible, so we move, so we fail.
void f ()
try {
X x;
throw x; // use of deleted function
} catch (...) {
}
Whereas here:
void g (X x)
try {
throw x;
} catch (...) {
}
the throw is again in an sk_try, but x's scope is an sk_function_parms
which *does* contain the {} of the *try-block, so x is not move-eligible,
so we don't move, so we use X(const X&), and the code is fine.
The current code also doesn't seem to handle
void h (X x) {
void z (decltype(throw x, true));
}
where there's no enclosing lambda or sk_try so we should move.
I'm not doing anything about lambdas because we shouldn't reach the
code at the end of the function: the DECL_HAS_VALUE_EXPR_P check
shouldn't let us go further.
PR c++/113789
PR c++/113853
gcc/cp/ChangeLog:
* typeck.cc (treat_lvalue_as_rvalue_p): Update code to better
reflect [expr.prim.id.unqual]#4.2.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/sfinae69.C: Remove dg-bogus.
* g++.dg/cpp0x/sfinae70.C: New test.
* g++.dg/cpp0x/sfinae71.C: New test.
* g++.dg/cpp0x/sfinae72.C: New test.
* g++.dg/cpp2a/implicit-move4.C: New test.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 19, 2024
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type. Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop. It's unlikely to make a difference on any real code, but the
simplification is also nice.
We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.
#define NC(X) \
template <class U> struct X#gcc-mirror#1; \
template <class U> struct X#gcc-mirror#2; \
template <class U> struct X#gcc-mirror#3; \
template <class U> struct X#gcc-mirror#4; \
template <class U> struct X#gcc-mirror#5; \
template <class U> struct X#gcc-mirror#6;
#define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
#define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
template <int I> struct A
{
NC3(am)
};
template <class...Ts> void sink(Ts...);
template <int...Is> void g()
{
sink(A<Is>()...);
}
template <int I> void f()
{
g<__integer_pack(I)...>();
}
int main()
{
f<1000>();
}
gcc/cp/ChangeLog:
* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
of a class template.
(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 19, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.
For the new test case:
const int table[2] = {1, 2};
int foo (char i) { return table[i]; }
compiling with -O2 -mlarge on msp430 we currently see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
which results in the following code:
foo: AND #0xff, R12
RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
RLAM.A gcc-mirror#1, R12
MOVX.W table(R12), R12
RETA
With this patch, we now see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8
foo: MOV.B R12, R12
RLAM.A gcc-mirror#1, R12
MOVX.W table(R12), R12
RETA
2023-10-26 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.
gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 19, 2024
Here we have
template<class T>
auto is_throwable(T t) -> decltype(throw t, true) { ... }
where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused
the wrong overload to have been chosen. Jason figured out it's because
we don't correctly implement [expr.prim.id.unqual]gcc-mirror#4.2, which post-P2266
says that an id-expression is move-eligible if
"the id-expression (possibly parenthesized) is the operand of
a throw-expression, and names an implicitly movable entity that belongs
to a scope that does not contain the compound-statement of the innermost
lambda-expression, try-block, or function-try-block (if any) whose
compound-statement or ctor-initializer contains the throw-expression."
I worked out that it's trying to say that given
struct X {
X();
X(const X&);
X(X&&) = delete;
};
the following should fail: the scope of the throw is an sk_try, and it's
also x's scope S, and S "does not contain the compound-statement of the
*try-block" so x is move-eligible, so we move, so we fail.
void f ()
try {
X x;
throw x; // use of deleted function
} catch (...) {
}
Whereas here:
void g (X x)
try {
throw x;
} catch (...) {
}
the throw is again in an sk_try, but x's scope is an sk_function_parms
which *does* contain the {} of the *try-block, so x is not move-eligible,
so we don't move, so we use X(const X&), and the code is fine.
The current code also doesn't seem to handle
void h (X x) {
void z (decltype(throw x, true));
}
where there's no enclosing lambda or sk_try so we should move.
I'm not doing anything about lambdas because we shouldn't reach the
code at the end of the function: the DECL_HAS_VALUE_EXPR_P check
shouldn't let us go further.
PR c++/113789
PR c++/113853
gcc/cp/ChangeLog:
* typeck.cc (treat_lvalue_as_rvalue_p): Update code to better
reflect [expr.prim.id.unqual]gcc-mirror#4.2.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/sfinae69.C: Remove dg-bogus.
* g++.dg/cpp0x/sfinae70.C: New test.
* g++.dg/cpp0x/sfinae71.C: New test.
* g++.dg/cpp0x/sfinae72.C: New test.
* g++.dg/cpp2a/implicit-move4.C: New test.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 21, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.
For the new test case:
const int table[2] = {1, 2};
int foo (char i) { return table[i]; }
compiling with -O2 -mlarge on msp430 we currently see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
which results in the following code:
foo: AND #0xff, R12
RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
RLAM.A gcc-mirror#1, R12
MOVX.W table(R12), R12
RETA
With this patch, we now see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8
foo: MOV.B R12, R12
RLAM.A gcc-mirror#1, R12
MOVX.W table(R12), R12
RETA
2023-10-26 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.
gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 25, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.
For the new test case:
const int table[2] = {1, 2};
int foo (char i) { return table[i]; }
compiling with -O2 -mlarge on msp430 we currently see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
which results in the following code:
foo: AND #0xff, R12
RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
RLAM.A gcc-mirror#1, R12
MOVX.W table(R12), R12
RETA
With this patch, we now see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8
foo: MOV.B R12, R12
RLAM.A gcc-mirror#1, R12
MOVX.W table(R12), R12
RETA
2023-10-26 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.
gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 25, 2024
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.
For the new test case:
const int table[2] = {1, 2};
int foo (char i) { return table[i]; }
compiling with -O2 -mlarge on msp430 we currently see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
which results in the following code:
foo: AND #0xff, R12
RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12
RLAM.A gcc-mirror#1, R12
MOVX.W table(R12), R12
RETA
With this patch, we now see:
Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8
foo: MOV.B R12, R12
RLAM.A gcc-mirror#1, R12
MOVX.W table(R12), R12
RETA
2023-10-26 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.
gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.
NinaRanns
referenced
this pull request
in NinaRanns/gcc
May 30, 2024
fixing tests and removing C++20 requirement
hubot
pushed a commit
that referenced
this pull request
Jun 13, 2024
Here during overload resolution we have two strictly viable ambiguous candidates #1 and #2, and two non-strictly viable candidates #3 and #4 which we hold on to ever since r14-6522. These latter candidates have an empty second arg conversion since the first arg conversion was deemed bad, and this trips up joust when called on #3 and #4 which assumes all arg conversions are there. We can fix this by making joust robust to empty arg conversions, but in this situation we shouldn't need to compare #3 and #4 at all given that we have a strictly viable candidate. To that end, this patch makes tourney shortcut considering non-strictly viable candidates upon encountering ambiguity between two strictly viable candidates (taking advantage of the fact that the candidates list is sorted according to viability via splice_viable). PR c++/115239 gcc/cp/ChangeLog: * call.cc (tourney): Don't consider a non-strictly viable candidate as the champ if there was ambiguity between two strictly viable candidates. gcc/testsuite/ChangeLog: * g++.dg/overload/error7.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
hubot
pushed a commit
that referenced
this pull request
Jun 17, 2024
Here during overload resolution we have two strictly viable ambiguous candidates #1 and #2, and two non-strictly viable candidates #3 and #4 which we hold on to ever since r14-6522. These latter candidates have an empty second arg conversion since the first arg conversion was deemed bad, and this trips up joust when called on #3 and #4 which assumes all arg conversions are there. We can fix this by making joust robust to empty arg conversions, but in this situation we shouldn't need to compare #3 and #4 at all given that we have a strictly viable candidate. To that end, this patch makes tourney shortcut considering non-strictly viable candidates upon encountering ambiguity between two strictly viable candidates (taking advantage of the fact that the candidates list is sorted according to viability via splice_viable). PR c++/115239 gcc/cp/ChangeLog: * call.cc (tourney): Don't consider a non-strictly viable candidate as the champ if there was ambiguity between two strictly viable candidates. gcc/testsuite/ChangeLog: * g++.dg/overload/error7.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com> (cherry picked from commit 7fed7e9)
hubot
pushed a commit
that referenced
this pull request
Jul 19, 2024
These tests used to generate:
bl swap
ldr r2, [sp, #4]
mov r0, r2 @ __fp16
but g:9d20529d94b23275885f380d155fe8671ab5353a means that we can
load directly into r0:
bl swap
ldrh r0, [sp, #4] @ __fp16
This patch updates the tests to "defend" this change.
While there, the scans include:
mov\tr1, r[03]}
But if the spill of r2 occurs first, there's no real reason why
r2 couldn't be used as the temporary, instead r3.
The patch tries to update the scans while preserving the spirit
of the originals.
gcc/testsuite/
* gcc.target/arm/fp16-aapcs-2.c: Expect the return value to be
loaded directly from the stack. Test that the swap generates
two moves out of r0/r1 and two moves in.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.
hubot
pushed a commit
that referenced
this pull request
Sep 7, 2024
…o_debug_section [PR116614]
cat abc.C
#define A(n) struct T##n {} t##n;
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(1) E(2) E(3)
int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
The following patch fixes that. Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.
Yet, the fix isn't solely about removing the
if (new_i - 1 >= SHN_LORESERVE)
{
*err = ENOTSUP;
return "Too many copied sections";
}
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit. But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.
So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.
2024-09-07 Jakub Jelinek <jakub@redhat.com>
PR lto/116614
* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
comments.
(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
consistency.
(simple_object_elf_find_sections): Formatting fixes.
(simple_object_elf_fetch_attributes): Likewise.
(simple_object_elf_attributes_merge): Likewise.
(simple_object_elf_start_write): Likewise.
(simple_object_elf_write_ehdr): Likewise.
(simple_object_elf_write_shdr): Likewise.
(simple_object_elf_write_to_file): Likewise.
(simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for
new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
over .symtab_shndx sections, though emit those last and compute their
section content when processing associated .symtab sections. Handle
simple_object_internal_read failure even in the .symtab_shndx reading
case.
hubot
pushed a commit
that referenced
this pull request
Sep 12, 2024
…o_debug_section [PR116614]
cat abc.C
#define A(n) struct T##n {} t##n;
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(1) E(2) E(3)
int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
The following patch fixes that. Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.
Yet, the fix isn't solely about removing the
if (new_i - 1 >= SHN_LORESERVE)
{
*err = ENOTSUP;
return "Too many copied sections";
}
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit. But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.
So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.
2024-09-07 Jakub Jelinek <jakub@redhat.com>
PR lto/116614
* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
comments.
(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
consistency.
(simple_object_elf_find_sections): Formatting fixes.
(simple_object_elf_fetch_attributes): Likewise.
(simple_object_elf_attributes_merge): Likewise.
(simple_object_elf_start_write): Likewise.
(simple_object_elf_write_ehdr): Likewise.
(simple_object_elf_write_shdr): Likewise.
(simple_object_elf_write_to_file): Likewise.
(simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for
new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
over .symtab_shndx sections, though emit those last and compute their
section content when processing associated .symtab sections. Handle
simple_object_internal_read failure even in the .symtab_shndx reading
case.
(cherry picked from commit bb8dd09)
hubot
pushed a commit
that referenced
this pull request
Sep 13, 2024
…o_debug_section [PR116614]
cat abc.C
#define A(n) struct T##n {} t##n;
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(1) E(2) E(3)
int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
The following patch fixes that. Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.
Yet, the fix isn't solely about removing the
if (new_i - 1 >= SHN_LORESERVE)
{
*err = ENOTSUP;
return "Too many copied sections";
}
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit. But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.
So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.
2024-09-07 Jakub Jelinek <jakub@redhat.com>
PR lto/116614
* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
comments.
(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
consistency.
(simple_object_elf_find_sections): Formatting fixes.
(simple_object_elf_fetch_attributes): Likewise.
(simple_object_elf_attributes_merge): Likewise.
(simple_object_elf_start_write): Likewise.
(simple_object_elf_write_ehdr): Likewise.
(simple_object_elf_write_shdr): Likewise.
(simple_object_elf_write_to_file): Likewise.
(simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for
new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
over .symtab_shndx sections, though emit those last and compute their
section content when processing associated .symtab sections. Handle
simple_object_internal_read failure even in the .symtab_shndx reading
case.
(cherry picked from commit bb8dd09)
hubot
pushed a commit
that referenced
this pull request
Oct 9, 2024
Whenever C1 and C2 are integer constants, X is of a wrapping type, and cmp is a relational operator, the expression X +- C1 cmp C2 can be simplified in the following cases: (a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial comparison in the following way: X +- C1 <= C2 -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1) -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions) -INF -+ C1 <= X <= +INF (due to (1)) -INF -+ C1 <= X (eliminate the right hand side since it holds for any X) (b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following sequence of transformations: X +- C1 >= C2 +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1) +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions) +INF -+ C1 >= X >= -INF (due to (1)) +INF -+ C1 >= X (eliminate the right hand side since it holds for any X) (c) The > and < cases are negations of (a) and (b), respectively. This transformation allows to occasionally save add / sub instructions, for instance the expression 3 + (uint32_t)f() < 2 compiles to cmn w0, #4 cset w0, ls instead of add w0, w0, 3 cmp w0, 2 cset w0, ls on aarch64. Testcases that go together with this patch have been split into two separate files, one containing testcases for unsigned variables and the other for wrapping signed ones (and thus compiled with -fwrapv). Additionally, one aarch64 test has been adjusted since the patch has caused the generated code to change from cmn w0, #2 csinc w0, w1, wzr, cc (x < -2) to cmn w0, #3 csinc w0, w1, wzr, cs (x <= -3) This patch has been bootstrapped and regtested on aarch64, x86_64, and i386, and additionally regtested on riscv32. gcc/ChangeLog: PR tree-optimization/116024 * match.pd: New transformation around integer comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr116024-2.c: New test. * gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto. * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.
hubot
pushed a commit
that referenced
this pull request
Nov 8, 2024
Update test case for armv8.1-m.main that supports conditional
arithmetic.
armv7-m:
push {r4, lr}
ldr r4, .L6
ldr r4, [r4]
lsls r4, r4, #29
it mi
addmi r2, r2, #1
bl bar
movs r0, #0
pop {r4, pc}
armv8.1-m.main:
push {r3, r4, r5, lr}
ldr r4, .L5
ldr r5, [r4]
tst r5, #4
csinc r2, r2, r2, eq
bl bar
movs r0, #0
pop {r3, r4, r5, pc}
gcc/testsuite/ChangeLog:
* gcc.target/arm/epilog-1.c: Use check-function-bodies.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
hubot
pushed a commit
that referenced
this pull request
Nov 8, 2024
Update test case for armv8.1-m.main that supports conditional
arithmetic.
armv7-m:
push {r4, lr}
ldr r4, .L6
ldr r4, [r4]
lsls r4, r4, #29
it mi
addmi r2, r2, #1
bl bar
movs r0, #0
pop {r4, pc}
armv8.1-m.main:
push {r3, r4, r5, lr}
ldr r4, .L5
ldr r5, [r4]
tst r5, #4
csinc r2, r2, r2, eq
bl bar
movs r0, #0
pop {r3, r4, r5, pc}
gcc/testsuite/ChangeLog:
* gcc.target/arm/epilog-1.c: Use check-function-bodies.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit ec86e87)
hubot
pushed a commit
that referenced
this pull request
Nov 26, 2024
In r14.2.0-376-g724446556e5, I accidentally introduced a regression in
the expected assembler as the csinc instruction was not used for
armv8.1-m.main.
The generated assembler for armv8.1-m.main is:
push {r3, r4, r5, lr}
ldr r4, .L5
ldr r5, [r4]
adds r4, r2, #1
tst r5, #4
it ne
movne r2, r4
bl bar
movs r0, #0
pop {r3, r4, r5, pc}
gcc/testsuite/ChangeLog:
* gcc.target/arm/epilog-1.c: Corrected armv8.1.m-main asm.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
hubot
pushed a commit
that referenced
this pull request
Feb 5, 2025
When generating thumb2 code,
LDM SP!, {PC}
is a two-byte instruction, whereas
LDR PC, [SP], #4
is needs 4 bytes. When optimizing for size, or when there's no obvious
performance benefit prefer the former.
gcc/ChangeLog:
PR target/118089
* config/arm/arm.cc (thumb2_expand_return): Use LDM SP!, {PC}
when optimizing for size, or when there's no performance benefit over
LDR PC, [SP], #4.
(arm_expand_epilogue): Likewise.
hubot
pushed a commit
that referenced
this pull request
Feb 7, 2025
My earlier change for making the compiler prefer
POP {PC}
over
LDR PC, [SP], #4
had a slightly unexpected consequence in that we now also call
arm_emit_multi_reg_pop to handle single register pops when the
register is not PC. This exposed a latent bug in this function where
the dwarf unwinding notes on the single-register POP were not being
set correctly.
gcc/
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust
note to single-register POP instructions.
hubot
pushed a commit
that referenced
this pull request
Jun 13, 2025
…o_debug_section [PR116614]
cat abc.C
#define A(n) struct T##n {} t##n;
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(1) E(2) E(3)
int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
The following patch fixes that. Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.
Yet, the fix isn't solely about removing the
if (new_i - 1 >= SHN_LORESERVE)
{
*err = ENOTSUP;
return "Too many copied sections";
}
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit. But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.
So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.
2024-09-07 Jakub Jelinek <jakub@redhat.com>
PR lto/116614
* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
comments.
(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
consistency.
(simple_object_elf_find_sections): Formatting fixes.
(simple_object_elf_fetch_attributes): Likewise.
(simple_object_elf_attributes_merge): Likewise.
(simple_object_elf_start_write): Likewise.
(simple_object_elf_write_ehdr): Likewise.
(simple_object_elf_write_shdr): Likewise.
(simple_object_elf_write_to_file): Likewise.
(simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for
new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
over .symtab_shndx sections, though emit those last and compute their
section content when processing associated .symtab sections. Handle
simple_object_internal_read failure even in the .symtab_shndx reading
case.
(cherry picked from commit bb8dd09)
yfeldblum
added a commit
to yfeldblum/gcc
that referenced
this pull request
Jul 18, 2025
When an exception is thrown and caught, destruction of the exception checks whether the exception was allocated in the `emergency_pool`, which is a global variable.
This global variable has a runtime constructor, which means access to it is valid only once the constructor has run during the module init phase.
But throwing and catching an exception is permitted at any time, not just during the lifetime of `main`. And this must be true whether libsupc++ is linked dynamically or statically.
LLVM Address Sanitizer aborts with `initialization-order-fiasco` when, in a binary which links libsupc++ statically, an exception is thrown and caught in some global constructor which happens to run prior to the global constructor of `emergency_pool`.
```
ERROR: AddressSanitizer: initialization-order-fiasco ...
READ of size 8 at ... thread T0
SCARINESS: 14 (8-byte-read-initialization-order-fiasco)
#0 ... in (anonymous namespace)::pool::in_pool(void*) gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc:258
gcc-mirror#1 ... in __cxa_free_exception gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc:302
gcc-mirror#2 ... in __gxx_exception_cleanup(_Unwind_Reason_Code, _Unwind_Exception*) gcc-11.x/libstdc++-v3/libsupc++/eh_throw.cc:51
gcc-mirror#3 ... in __cxa_end_catch gcc-11.x/libstdc++-v3/libsupc++/eh_catch.cc:125
...
... in __cxx_global_var_init ...
...
... in call_init.part.0 glibc-2.40/elf/dl-init.c:74:3
... in call_init glibc-2.40/elf/dl-init.c:120:14
... in _dl_init glibc-2.40/elf/dl-init.c:121:5
... in _dl_start_user glibc-2.40/elf/../sysdeps/aarch64/dl-start.S:46
... is located 56 bytes inside of global variable '(anonymous namespace)::emergency_pool' defined in 'gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc' (...) of size 72
registered at:
#0 ... in __asan_register_globals.part.0 llvm-project/compiler-rt/lib/asan/asan_globals.cpp:393:3
gcc-mirror#1 ... in __asan_register_globals llvm-project/compiler-rt/lib/asan/asan_globals.cpp:392:3
gcc-mirror#2 ... in __asan_register_elf_globals llvm-project/compiler-rt/lib/asan/asan_globals.cpp:376:26
gcc-mirror#3 ... in call_init.part.0 glibc-2.40/elf/dl-init.c:74:3
gcc-mirror#4 ... in call_init glibc-2.40/elf/dl-init.c:120:14
gcc-mirror#5 ... in _dl_init glibc-2.40/elf/dl-init.c:121:5
gcc-mirror#6 ... in _dl_start_user glibc-2.40/elf/../sysdeps/aarch64/dl-start.S:46
```
yfeldblum
added a commit
to yfeldblum/gcc
that referenced
this pull request
Jul 18, 2025
When an exception is thrown and caught, destruction of the exception checks whether the exception was allocated in the `emergency_pool`, which is a global variable.
This global variable has a runtime constructor, which means access to it is valid only once the constructor has run during the module init phase.
But throwing and catching an exception is permitted at any time, not just during the lifetime of `main`. And this must be true whether libsupc++ is linked dynamically or statically.
LLVM Address Sanitizer aborts with `initialization-order-fiasco` when, in a binary which links libsupc++ statically, an exception is thrown and caught in some global constructor which happens to run prior to the global constructor of `emergency_pool`.
```
ERROR: AddressSanitizer: initialization-order-fiasco ...
READ of size 8 at ... thread T0
SCARINESS: 14 (8-byte-read-initialization-order-fiasco)
#0 ... in (anonymous namespace)::pool::in_pool(void*) gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc:258
gcc-mirror#1 ... in __cxa_free_exception gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc:302
gcc-mirror#2 ... in __gxx_exception_cleanup(_Unwind_Reason_Code, _Unwind_Exception*) gcc-11.x/libstdc++-v3/libsupc++/eh_throw.cc:51
gcc-mirror#3 ... in __cxa_end_catch gcc-11.x/libstdc++-v3/libsupc++/eh_catch.cc:125
...
... in __cxx_global_var_init ...
...
... in call_init.part.0 glibc-2.40/elf/dl-init.c:74:3
... in call_init glibc-2.40/elf/dl-init.c:120:14
... in _dl_init glibc-2.40/elf/dl-init.c:121:5
... in _dl_start_user glibc-2.40/elf/../sysdeps/aarch64/dl-start.S:46
... is located 56 bytes inside of global variable '(anonymous namespace)::emergency_pool' defined in 'gcc-11.x/libstdc++-v3/libsupc++/eh_alloc.cc' (...) of size 72
registered at:
#0 ... in __asan_register_globals.part.0 llvm-project/compiler-rt/lib/asan/asan_globals.cpp:393:3
gcc-mirror#1 ... in __asan_register_globals llvm-project/compiler-rt/lib/asan/asan_globals.cpp:392:3
gcc-mirror#2 ... in __asan_register_elf_globals llvm-project/compiler-rt/lib/asan/asan_globals.cpp:376:26
gcc-mirror#3 ... in call_init.part.0 glibc-2.40/elf/dl-init.c:74:3
gcc-mirror#4 ... in call_init glibc-2.40/elf/dl-init.c:120:14
gcc-mirror#5 ... in _dl_init glibc-2.40/elf/dl-init.c:121:5
gcc-mirror#6 ... in _dl_start_user glibc-2.40/elf/../sysdeps/aarch64/dl-start.S:46
```
hubot
pushed a commit
that referenced
this pull request
Oct 15, 2025
The vadcq and vsbcq patterns had two problems:
- the adc / sbc part of the pattern did not mention the use of vfpcc
- the carry calcultation part should use a different unspec code
In addtion, the get_fpscr_nzcvqc and set_fpscr_nzcvqc were
over-cautious by using unspec_volatile when unspec is really what they
need. Making them unspec enables to remove redundant accesses to
FPSCR_nzcvqc.
With unspec_volatile, we used to generate:
test_2:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 0, uses_anonymous_args = 0
vmov.i32 q0, #0x1 @ v4si
push {lr}
sub sp, sp, #12
vmrs r3, FPSCR_nzcvqc ;; [1]
bic r3, r3, #536870912
vmsr FPSCR_nzcvqc, r3
vadc.i32 q3, q0, q0
vmrs r3, FPSCR_nzcvqc ;; [2]
vmrs r3, FPSCR_nzcvqc
orr r3, r3, #536870912
vmsr FPSCR_nzcvqc, r3
vadc.i32 q0, q0, q0
vmrs r3, FPSCR_nzcvqc
ldr r0, .L8
ubfx r3, r3, #29, #1
str r3, [sp, #4]
bl print_uint32x4_t
add sp, sp, #12
@ sp needed
pop {pc}
.L9:
.align 2
.L8:
.word .LC1
with unspec, we generate:
test_2:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 0, uses_anonymous_args = 0
vmrs r3, FPSCR_nzcvqc ;; [1]
bic r3, r3, #536870912 ;; [3]
vmov.i32 q0, #0x1 @ v4si
vmsr FPSCR_nzcvqc, r3
vadc.i32 q3, q0, q0
vmrs r3, FPSCR_nzcvqc
orr r3, r3, #536870912
vmsr FPSCR_nzcvqc, r3
vadc.i32 q0, q0, q0
vmrs r3, FPSCR_nzcvqc
push {lr}
ubfx r3, r3, #29, #1
sub sp, sp, #12
ldr r0, .L8
str r3, [sp, #4]
bl print_uint32x4_t
add sp, sp, #12
@ sp needed
pop {pc}
.L9:
.align 2
.L8:
.word .LC1
That is, unspec in get_fpscr_nzcvqc enables to:
- move [1] earlier
- delete redundant [2]
and unspec in set_fpscr_nzcvqc enables to move push {lr} and stack
manipulation later.
gcc/ChangeLog:
PR target/122189
* config/arm/iterators.md (VxCIQ_carry, VxCIQ_M_carry, VxCQ_carry)
(VxCQ_M_carry): New iterators.
* config/arm/mve.md (get_fpscr_nzcvqc, set_fpscr_nzcvqc): Use
unspec instead of unspec_volatile.
(vadciq, vadciq_m, vadcq, vadcq_m): Use vfpcc in operation. Use a
different unspec code for carry calcultation.
* config/arm/unspecs.md (VADCQ_U_carry, VADCQ_M_U_carry)
(VADCQ_S_carry, VADCQ_M_S_carry, VSBCIQ_U_carry ,VSBCIQ_S_carry
,VSBCIQ_M_U_carry ,VSBCIQ_M_S_carry ,VSBCQ_U_carry ,VSBCQ_S_carry
,VSBCQ_M_U_carry ,VSBCQ_M_S_carry ,VADCIQ_U_carry
,VADCIQ_M_U_carry ,VADCIQ_S_carry ,VADCIQ_M_S_carry): New unspec
codes.
gcc/testsuite/ChangeLog:
PR target/122189
* gcc.target/arm/mve/intrinsics/vadcq-check-carry.c: New test.
* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Adjust instructions
order.
* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Likewise.
hubot
pushed a commit
that referenced
this pull request
Oct 22, 2025
The vectorizer has learned how to do boolean reductions of masks to a C bool
for the operations OR, XOR and AND.
This implements the new optabs for Adv.SIMD. Adv.SIMD today can already
vectorize such loops but does so through SHIFT-AND-INSERT to perform the
reductions step-wise and inorder. As an example, an OR reduction today does:
movi v3.4s, 0
ext v5.16b, v30.16b, v3.16b, #8
orr v5.16b, v5.16b, v30.16b
ext v29.16b, v5.16b, v3.16b, #4
orr v29.16b, v29.16b, v5.16b
ext v4.16b, v29.16b, v3.16b, #2
orr v4.16b, v4.16b, v29.16b
ext v3.16b, v4.16b, v3.16b, #1
orr v3.16b, v3.16b, v4.16b
fmov w1, s3
and w1, w1, 1
For reducing to a boolean however we don't need the stepwise reduction and can
just look at the bit patterns. For e.g. OR we now generate:
umaxp v3.4s, v3.4s, v3.4s
fmov x1, d3
cmp x1, 0
cset w0, ne
For the remaining codegen see test vect-reduc-bool-9.c.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (reduc_sbool_and_scal_<mode>,
reduc_sbool_ior_scal_<mode>, reduc_sbool_xor_scal_<mode>): New.
* config/aarch64/iterators.md (VALLI): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vect-reduc-bool-1.c: New test.
* gcc.target/aarch64/vect-reduc-bool-2.c: New test.
* gcc.target/aarch64/vect-reduc-bool-3.c: New test.
* gcc.target/aarch64/vect-reduc-bool-4.c: New test.
* gcc.target/aarch64/vect-reduc-bool-5.c: New test.
* gcc.target/aarch64/vect-reduc-bool-6.c: New test.
* gcc.target/aarch64/vect-reduc-bool-7.c: New test.
* gcc.target/aarch64/vect-reduc-bool-8.c: New test.
* gcc.target/aarch64/vect-reduc-bool-9.c: New test.
hubot
pushed a commit
that referenced
this pull request
Nov 12, 2025
The vadcq and vsbcq patterns had two problems:
- the adc / sbc part of the pattern did not mention the use of vfpcc
- the carry calcultation part should use a different unspec code
In addtion, the get_fpscr_nzcvqc and set_fpscr_nzcvqc were
over-cautious by using unspec_volatile when unspec is really what they
need. Making them unspec enables to remove redundant accesses to
FPSCR_nzcvqc.
With unspec_volatile, we used to generate:
test_2:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 0, uses_anonymous_args = 0
vmov.i32 q0, #0x1 @ v4si
push {lr}
sub sp, sp, #12
vmrs r3, FPSCR_nzcvqc ;; [1]
bic r3, r3, #536870912
vmsr FPSCR_nzcvqc, r3
vadc.i32 q3, q0, q0
vmrs r3, FPSCR_nzcvqc ;; [2]
vmrs r3, FPSCR_nzcvqc
orr r3, r3, #536870912
vmsr FPSCR_nzcvqc, r3
vadc.i32 q0, q0, q0
vmrs r3, FPSCR_nzcvqc
ldr r0, .L8
ubfx r3, r3, #29, #1
str r3, [sp, #4]
bl print_uint32x4_t
add sp, sp, #12
@ sp needed
pop {pc}
.L9:
.align 2
.L8:
.word .LC1
with unspec, we generate:
test_2:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 0, uses_anonymous_args = 0
vmrs r3, FPSCR_nzcvqc ;; [1]
bic r3, r3, #536870912 ;; [3]
vmov.i32 q0, #0x1 @ v4si
vmsr FPSCR_nzcvqc, r3
vadc.i32 q3, q0, q0
vmrs r3, FPSCR_nzcvqc
orr r3, r3, #536870912
vmsr FPSCR_nzcvqc, r3
vadc.i32 q0, q0, q0
vmrs r3, FPSCR_nzcvqc
push {lr}
ubfx r3, r3, #29, #1
sub sp, sp, #12
ldr r0, .L8
str r3, [sp, #4]
bl print_uint32x4_t
add sp, sp, #12
@ sp needed
pop {pc}
.L9:
.align 2
.L8:
.word .LC1
That is, unspec in get_fpscr_nzcvqc enables to:
- move [1] earlier
- delete redundant [2]
and unspec in set_fpscr_nzcvqc enables to move push {lr} and stack
manipulation later.
gcc/ChangeLog:
PR target/122189
* config/arm/iterators.md (VxCIQ_carry, VxCIQ_M_carry, VxCQ_carry)
(VxCQ_M_carry): New iterators.
* config/arm/mve.md (get_fpscr_nzcvqc, set_fpscr_nzcvqc): Use
unspec instead of unspec_volatile.
(vadciq, vadciq_m, vadcq, vadcq_m): Use vfpcc in operation. Use a
different unspec code for carry calcultation.
* config/arm/unspecs.md (VADCQ_U_carry, VADCQ_M_U_carry)
(VADCQ_S_carry, VADCQ_M_S_carry, VSBCIQ_U_carry ,VSBCIQ_S_carry
,VSBCIQ_M_U_carry ,VSBCIQ_M_S_carry ,VSBCQ_U_carry ,VSBCQ_S_carry
,VSBCQ_M_U_carry ,VSBCQ_M_S_carry ,VADCIQ_U_carry
,VADCIQ_M_U_carry ,VADCIQ_S_carry ,VADCIQ_M_S_carry): New unspec
codes.
gcc/testsuite/ChangeLog:
PR target/122189
* gcc.target/arm/mve/intrinsics/vadcq-check-carry.c: New test.
* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Adjust instructions
order.
* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Likewise.
(cherry picked from commits
0272058 and
697ccad)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pulling for study purpose, no changes expected