mirrored from git://gcc.gnu.org/git/gcc.git
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Aldyh/cilk in gomp #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sushantchry
wants to merge
79
commits into
master
Choose a base branch
from
aldyh/cilk-in-gomp
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
See http://openmp.org/wp/2013/03/openmp-40-rc2/ for the standard draft. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196809 138bc75d-0d04-0410-961f-82ee72b054a4
Add another argument to c_finish_omp_atomic. * parser.c (cp_parser_binary_expression): Handle no_toplevel_fold_p even for binary operations other than comparison. (cp_parser_omp_atomic): Handle parsing OpenMP 4.0 atomics. * pt.c (tsubst_expr) <case OMP_ATOMIC>: Handle atomic exchange. * semantics.c (finish_omp_atomic): Use cp_tree_equal to diagnose expression mismatches and to find out if c_finish_omp_atomic should be called with swapped set to true or false. * c-omp.c (c_finish_omp_atomic): Add swapped argument, if true, build the operation first with rhs, lhs arguments and use NOP_EXPR build_modify_expr. * c-common.h (c_finish_omp_atomic): Adjust prototype. * c-c++-common/gomp/atomic-15.c: Remove error test that is now valid in OpenMP 4.0. * testsuite/libgomp.c++/atomic-10.C: New test. * testsuite/libgomp.c++/atomic-11.C: New test. * testsuite/libgomp.c++/atomic-12.C: New test. * testsuite/libgomp.c++/atomic-13.C: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196815 138bc75d-0d04-0410-961f-82ee72b054a4
with default value, pass it down to c_parser_conditional_expression. (c_parser_conditional_expression): Add omp_atomic_lhs argument, pass it down to c_parser_binary_expression. Don't pass PREC_NONE to it. Adjust recursive call. (c_parser_binary_expression): Remove prec argument, add omp_atomic_lhs argument. Always start from PREC_NONE, if omp_atomic_lhs is non-NULL and one of the arguments of toplevel binop matches it, use build2 instead of parser_build_binary_op. (c_parser_omp_atomic): Handle OpenMP 4.0 atomics. (c_parser_omp_for_loop): Adjust c_parser_binary_expression caller. * c-tree.h (c_tree_equal): New prototype. * c-typeck.c (c_tree_equal): New function. * parser.c (cp_parser_omp_atomic): Never restart unless structured_block is true. * c-c++-common/gomp/atomic-15.c: Adjust for C diagnostics. * testsuite/libgomp.c/atomic-14.c: Add parens to make it valid. * testsuite/libgomp.c/atomic-15.c: New test. * testsuite/libgomp.c/atomic-16.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196816 138bc75d-0d04-0410-961f-82ee72b054a4
* env.c (handle_omp_display_env): New function. (initialize_env): Use it. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196817 138bc75d-0d04-0410-961f-82ee72b054a4
* libgomp.texi (Environment Variables): Minor cleanup, update section refs to OpenMP 4.0rc2. (OMP_DISPLAY_ENV, GOMP_SPINCOUNT): Document these environment variables. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@196818 138bc75d-0d04-0410-961f-82ee72b054a4
GIMPLE_OMP_FOR kinds. * tree.def (OMP_SIMD, OMP_FOR_SIMD, OMP_DISTRIBUTE): New tree codes. * gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_MASK, GF_OMP_FOR_KIND_FOR, GF_OMP_FOR_KIND_SIMD, GF_OMP_FOR_KIND_FOR_SIMD and GF_OMP_FOR_KIND_DISTRIBUTE. (gimple_omp_for_kind, gimple_omp_for_set_kind): New inline functions. * gimplify.c (is_gimple_stmt, gimplify_omp_for, gimplify_expr): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. * tree.c (omp_clause_num_ops, omp_clause_code_name, walk_tree_1): Handle new OpenMP 4.0 clauses. * tree-pretty-print.c (dump_omp_clause): Likewise. (dump_generic_node): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. * tree.h (enum omp_clause_code): Add OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED, OMP_CLAUSE_DEPEND, OMP_CLAUSE_FROM, OMP_CLAUSE_TO, OMP_CLAUSE_UNIFORM, OMP_CLAUSE_MAP, OMP_CLAUSE_DEVICE, OMP_CLAUSE_DIST_SCHEDULE, OMP_CLAUSE_INBRANCH, OMP_CLAUSE_NOTINBRANCH, OMP_CLAUSE_NUM_TEAMS, OMP_CLAUSE_PROC_BIND, OMP_CLAUSE_SAFELEN, OMP_CLAUSE_SIMDLEN, OMP_CLAUSE_FOR, OMP_CLAUSE_PARALLEL, OMP_CLAUSE_SECTIONS and OMP_CLAUSE_TASKGROUP. (OMP_LOOP_CHECK): Define. (OMP_FOR_BODY, OMP_FOR_CLAUSES, OMP_FOR_INIT, OMP_FOR_COND, OMP_FOR_INCR, OMP_FOR_PRE_BODY): Use OMP_LOOP_CHECK instead of OMP_FOR_CHECK. (OMP_CLAUSE_DECL): Extend check range up to OMP_CLAUSE_MAP. (OMP_CLAUSE_LINEAR_STEP, OMP_CLAUSE_ALIGNED_ALIGNMENT, OMP_CLAUSE_NUM_TEAMS_EXPR, OMP_CLAUSE_DEVICE_ID, OMP_CLAUSE_DIST_SCHEDULE_CHUNK_EXPR, OMP_CLAUSE_SAFELEN_EXPR, OMP_CLAUSE_SIMDLEN_EXPR): Define. (enum omp_clause_depend_kind, enum omp_clause_map_kind, enum omp_clause_proc_bind_kind): New enums. (OMP_CLAUSE_DEPEND_KIND, OMP_CLAUSE_MAP_KIND, OMP_CLAUSE_PROC_BIND_KIND): Define. (struct tree_omp_clause): Add subcode.depend_kind, subcode.map_kind and subcode.proc_bind_kind. (find_omp_clause): New prototype. * omp-builtins.def (BUILT_IN_GOMP_CANCEL, BUILT_IN_GOMP_CANCELLATION_POINT): New built-ins. * tree-flow.h (find_omp_clause): Remove prototype. c/ * c-parser.c (c_parser_omp_all_clauses): Change mask argument type from unsigned to omp_clause_mask. (c_parser_omp_for_loop): Adjust c_finish_omp_for caller. (OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK, OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK, OMP_TASK_CLAUSE_MASK): Use OMP_CLAUSE_MASK_1 instead of 1. (c_parser_omp_parallel): Use omp_clause_mask type instead of unsigned for mask, use OMP_CLAUSE_MASK_1 instead of 1 for masks. cp/ * cp-tree.h (OMP_FOR_GIMPLIFYING_P): Use OMP_LOOP_CHECK instead of OMP_FOR_CHECK. (finish_omp_for): Add enum tree_code second argument. (finish_omp_cancel, finish_omp_cancellation_point): New prototypes. * cp-gimplify.c (cp_gimplify_expr, cp_genericize_r): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. * semantics.c (finish_omp_clauses): Handle new OpenMP 4.0 clauses. (finish_omp_for): Add code argument, pass it down to make_node or c_finish_omp_for. (finish_omp_cancel, finish_omp_cancellation_point): New functions. * parser.c (cp_parser_omp_clause_name): Add parsing of new OpenMP 4.0 clauses. (cp_parser_omp_var_list_no_open): Add COLON argument, if non-NULL, accept termination by colon instead of closing paren. (cp_parser_omp_var_list, cp_parser_omp_clause_reduction): Adjust callers. (cp_parser_omp_clause_branch, cp_parser_omp_clause_cancelkind, cp_parser_omp_clause_num_teams, cp_parser_omp_clause_aligned, cp_parser_omp_clause_linear, cp_parser_omp_clause_depend, cp_parser_omp_clause_map, cp_parser_omp_clause_device, cp_parser_omp_clause_dist_schedule, cp_parser_omp_clause_proc_bind): New functions. (cp_parser_omp_all_clauses): Change mask argument's type to omp_clause_mask from unsigned. Fix c_name for PRAGMA_OMP_CLAUSE_UNTIED. Handle new OpenMP 4.0 clauses. (cp_parser_omp_for_loop): Add code argument. Pass it down to finish_omp_for. (OMP_SIMD_CLAUSE_MASK): Define. (cp_parser_omp_simd): New function. (OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK, OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK, OMP_TASK_CLAUSE_MASK): Use OMP_CLAUSE_MASK_1 instead of 1. (cp_parser_omp_for): Handle parsing of #pragma omp for simd. (cp_parser_omp_parallel): Handle parsing of #pragma omp parallel for simd. Use omp_clause_mask type instead of unsigned for mask, use OMP_CLAUSE_MASK_1 instead of 1 for masks. (OMP_CANCEL_CLAUSE_MASK, OMP_CANCELLATION_POINT_CLAUSE_MASK): Define. (cp_parser_omp_cancel, cp_parser_omp_cancellation_point): New functions. (cp_parser_omp_construct): Handle PRAGMA_OMP_SIMD, PRAGMA_OMP_CANCEL and PRAGMA_OMP_CANCELLATION_POINT. (cp_parser_pragma): Handle PRAGMA_OMP_SIMD. * pt.c (tsubst_expr): Handle OMP_SIMD, OMP_FOR_SIMD and OMP_DISTRIBUTE. Pass down TREE_CODE to finish_omp_for. fortran/ * f95-lang.c (ATTR_NULL): Define. c-family/ * c-omp.c (c_finish_omp_for): Add code argument, pass it down to make_code. (c_split_parallel_clauses): Handle OMP_CLAUSE_SAFELEN, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_LINEAR. * c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_CANCEL, PRAGMA_OMP_CANCELLATION_POINT, PRAGMA_OMP_DECLARE_REDUCTION, PRAGMA_OMP_DECLARE_SIMD, PRAGMA_OMP_DECLARE_TARGET, PRAGMA_OMP_DISTRIBUTE, PRAGMA_OMP_END_DECLARE_TARGET, PRAGMA_OMP_FOR_SIMD, PRAGMA_OMP_PARALLEL_FOR_SIMD, PRAGMA_OMP_SIMD, PRAGMA_OMP_TARGET, PRAGMA_OMP_TARGET_DATA, PRAGMA_OMP_TARGET_UPDATE, PRAGMA_OMP_TASKGROUP and PRAGMA_OMP_TEAMS. (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_ALIGNED, PRAGMA_OMP_CLAUSE_DEPEND, PRAGMA_OMP_CLAUSE_DEVICE, PRAGMA_OMP_CLAUSE_DIST_SCHEDULE, PRAGMA_OMP_CLAUSE_FOR, PRAGMA_OMP_CLAUSE_FROM, PRAGMA_OMP_CLAUSE_INBRANCH, PRAGMA_OMP_CLAUSE_LINEAR, PRAGMA_OMP_CLAUSE_MAP, PRAGMA_OMP_CLAUSE_NOTINBRANCH, PRAGMA_OMP_CLAUSE_NUM_TEAMS, PRAGMA_OMP_CLAUSE_PARALLEL, PRAGMA_OMP_CLAUSE_PROC_BIND, PRAGMA_OMP_CLAUSE_SAFELEN, PRAGMA_OMP_CLAUSE_SECTIONS, PRAGMA_OMP_CLAUSE_SIMDLEN, PRAGMA_OMP_CLAUSE_TASKGROUP, PRAGMA_OMP_CLAUSE_TO and PRAGMA_OMP_CLAUSE_UNIFORM. * c-pragma.c (omp_pragmas): Add new OpenMP 4.0 constructs. * c-common.h (c_finish_omp_for): Add enum tree_code as second argument. (OMP_CLAUSE_MASK_1): Define. (omp_clause_mask): For HWI >= 64 new typedef for unsigned HOST_WIDE_INT, otherwise a class with needed ctors and operators. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197161 138bc75d-0d04-0410-961f-82ee72b054a4
OMP_SIMD and OMP_FOR_SIMD loops. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197515 138bc75d-0d04-0410-961f-82ee72b054a4
omp_get_proc_bind, omp_get_proc_bind_, omp_set_default_device, omp_set_default_device_, omp_set_default_device_8_, omp_get_default_device, omp_get_default_device_, omp_get_num_devices, omp_get_num_devices_, omp_get_num_teams, omp_get_num_teams_, omp_get_team_num, omp_get_team_num_): Export @@OMP_4.0. (GOMP_cancel, GOMP_cancellation_point, GOMP_parallel_loop_dynamic, GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime, GOMP_parallel_loop_static, GOMP_parallel_sections, GOMP_parallel, GOMP_taskgroup_start, GOMP_taskgroup_end): Export @@GOMP_4.0. * parallel.c (GOMP_parallel_end): Add ialias. (GOMP_parallel, GOMP_cancel, GOMP_cancellation_point): New functions. * omp.h.in (omp_proc_bind_t): New typedef. (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New prototypes. * env.c (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New functions. * fortran.c (ULP, STR1, STR2, ialias_redirect): Removed. (omp_get_cancellation_, omp_get_proc_bind_, omp_set_default_device_, omp_set_default_device_8_, omp_get_default_device_, omp_get_num_devices_, omp_get_num_teams_, omp_get_team_num_): New functions. * libgomp.h (ialias_ulp, ialias_str1, ialias_str2, ialias_redirect, ialias_call): Define. * libgomp_g.h (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic, GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime, GOMP_parallel, GOMP_cancel, GOMP_cancellation_point, GOMP_taskgroup_start, GOMP_taskgroup_end, GOMP_parallel_sections): New prototypes. * task.c (GOMP_taskgroup_start, GOMP_taskgroup_end): New functions. * sections.c (GOMP_parallel_sections): New function. * loop.c (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic, GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime): New functions. (GOMP_parallel_end): Add ialias_redirect. * omp_lib.f90.in (omp_proc_bind_kind, omp_proc_bind_false, omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close, omp_proc_bind_spread): New params. (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New interfaces. * omp_lib.h.in (omp_proc_bind_kind, omp_proc_bind_false, omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close, omp_proc_bind_spread): New params. (omp_get_cancellation, omp_get_proc_bind, omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams, omp_get_team_num): New externals. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197670 138bc75d-0d04-0410-961f-82ee72b054a4
(BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove. (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New. * gimplify.c (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_PROC_BIND. * omp-builtins.def (BUILT_IN_GOMP_TASKGROUP_START, BUILT_IN_GOMP_TASKGROUP_END, BUILT_IN_GOMP_PARALLEL_LOOP_STATIC, BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC, BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED, BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME, BUILT_IN_GOMP_PARALLEL, BUILT_IN_GOMP_PARALLEL_SECTIONS): New built-ins. (BUILT_IN_GOMP_PARALLEL_LOOP_STATIC_START, BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC_START, BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED_START, BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME_START, BUILT_IN_GOMP_PARALLEL_START, BUILT_IN_GOMP_PARALLEL_END, BUILT_IN_GOMP_PARALLEL_SECTIONS_START): Remove. * omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_PROC_BIND. (expand_parallel_call): Expand #pragma omp parallel* as calls to the new GOMP_parallel_* APIs without _start at the end, instead of GOMP_parallel_*_start followed by fn.omp_fn.N call, followed by GOMP_parallel_end. Handle OMP_CLAUSE_PROC_BIND. * tree-ssa-alias.c (ref_maybe_used_by_call_p_1, call_may_clobber_ref_p_1): Handle BUILT_IN_GOMP_TASKGROUP_END instead of BUILT_IN_GOMP_PARALLEL_END. c-family/ * c-common.c (DEF_FUNCTION_TYPE_8): Define. * c-omp.c (c_split_parallel_clauses): Handle OMP_CLAUSE_PROC_BIND. cp/ * cp-tree.h (finish_omp_taskgroup): New prototype. * parser.c (cp_parser_omp_clause_proc_bind): Require ) instead of colon at the end of the clause. (cp_parser_omp_taskgroup): New function. (cp_parser_omp_construct, cp_parser_pragma): Handle PRAGMA_OMP_TASKGROUP. * semantics.c (finish_omp_taskgroup): New function. fortran/ * f95-lang.c (DEF_FUNCTION_TYPE_8): Define. * types.def (DEF_FUNCTION_TYPE_8): Document. (BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove. (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New. ada/ * gcc-interface/utils.c (DEF_FUNCTION_TYPE_8): Define. lto/ * lto-lang.c (DEF_FUNCTION_TYPE_8): Define. testsuite/ * gcc.dg/gomp/combined-1.c: Look for GOMP_parallel_loop_runtime instead of GOMP_parallel_loop_runtime_start. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@197676 138bc75d-0d04-0410-961f-82ee72b054a4
OMP_CLAUSE_LINEAR_NO_COPYOUT): Define. * omp-low.c (extract_omp_for_data): Handle #pragma omp simd. (build_outer_var_ref): For #pragma omp simd allow linear etc. clauses to bind even to private vars. (scan_sharing_clauses): Handle OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_SAFELEN. (lower_rec_input_clauses): Handle OMP_CLAUSE_LINEAR. Don't emit a GOMP_barrier call for firstprivate/lastprivate in #pragma omp simd. (lower_lastprivate_clauses): Handle also OMP_CLAUSE_LINEAR. (expand_omp_simd): New function. (expand_omp_for): Handle #pragma omp simd. * gimplify.c (enum gimplify_omp_var_data): Add GOVD_LINEAR and GOVD_ALIGNED, add GOVD_LINEAR into GOVD_DATA_SHARE_CLASS. (enum omp_region_type): Add ORT_SIMD. (gimple_add_tmp_var, gimplify_var_or_parm_decl, omp_check_private, omp_firstprivatize_variable, omp_notice_variable): Handle ORT_SIMD like ORT_WORKSHARE. (omp_is_private): Likewise. Add SIMD argument, tweak diagnostics and add extra errors in simd constructs. (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_LINEAR, OMP_CLAUSE_ALIGNED and OMP_CLAUSE_SAFELEN. (gimplify_adjust_omp_clauses_1): Handle GOVD_LASTPRIVATE and GOVD_ALIGNED. (gimplify_omp_for): Handle #pragma omp simd. cp/ * cp-tree.h (CP_OMP_CLAUSE_INFO): Also allow it on OMP_CLAUSE_LINEAR. * parser.c (cp_parser_omp_var_list_no_open): If colon is non-NULL, temporarily disable colon_corrects_to_scope_p during the parsing of the variable list. (cp_parser_omp_clause_safelen, cp_parser_omp_clause_simdlen): New functions. (cp_parser_omp_all_clauses): Handle OMP_CLAUSE_SAFELEN and OMP_CLAUSE_SIMDLEN. * semantics.c (finish_omp_clauses): Allow NULL_TREE in OMP_CLAUSE_ALIGNED_ALIGNMENT. testsuite/ * c-c++-common/gomp/simd1.c: New test. * c-c++-common/gomp/simd2.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198092 138bc75d-0d04-0410-961f-82ee72b054a4
OMP_SIMD infrastructure.
* gimplify.c (gimplify_adjust_omp_clauses): For linear clauses if outer_context is non-NULL, but not ORT_COMBINED_PARALLEL, call omp_notice_variable. Remove aligned clauses that can't be handled yet. * omp-low.c: Include target.h. (scan_sharing_clauses): For aligned clauses with global arrays register local replacement. (omp_clause_aligned_alignment): New function. (lower_rec_input_clauses): For aligned clauses for global arrays or automatic pointers emit __builtin_assume_aligned before the loop if possible. (expand_omp_regimplify_p, expand_omp_build_assign): New functions. (expand_omp_simd): Use them. Handle pointer iterators and broken loops. (lower_omp_for): Call lower_omp on gimple_omp_body_ptr after calling lower_rec_input_clauses, not before it. cp/ * semantics.c (finish_omp_clauses): On OMP_CLAUSE_LINEAR clauses verify OMP_CLAUSE_DECL has integral or pointer type, and handle linear steps for pointer type decls. FIx up handling of OMP_CLAUSE_UNIFORM. testsuite/ * c-c++-common/gomp/simd3.c: New test. * c-c++-common/gomp/simd4.c: New test. * c-c++-common/gomp/simd5.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198193 138bc75d-0d04-0410-961f-82ee72b054a4
Conflicts: gcc/omp-low.c
* c-parser.c (c_parser_compound_statement, c_parser_statement): Adjust comments for OpenMP 3.0+ additions. (c_parser_pragma): Handle PRAGMA_OMP_CANCEL and PRAGMA_OMP_CANCELLATION_POINT. (c_parser_omp_clause_name): Handle new OpenMP 4.0 clauses. (c_parser_omp_clause_collapse): Fully fold collapse expression. (c_parser_omp_clause_branch, c_parser_omp_clause_cancelkind, c_parser_omp_clause_num_teams, c_parser_omp_clause_aligned, c_parser_omp_clause_linear, c_parser_omp_clause_safelen, c_parser_omp_clause_simdlen, c_parser_omp_clause_depend, c_parser_omp_clause_map, c_parser_omp_clause_device, c_parser_omp_clause_dist_schedule, c_parser_omp_clause_proc_bind, c_parser_omp_clause_to, c_parser_omp_clause_from, c_parser_omp_clause_uniform): New functions. (c_parser_omp_all_clauses): Handle new OpenMP 4.0 clauses. (c_parser_omp_for_loop): Add CODE argument, pass it through to c_finish_omp_for. (OMP_SIMD_CLAUSE_MASK): Define. (c_parser_omp_simd): New function. (c_parser_omp_for): Parse #pragma omp for simd. (OMP_PARALLEL_CLAUSE_MASK): Add OMP_CLAUSE_PROC_BIND. (c_parser_omp_parallel): Parse #pragma omp parallel for simd. (OMP_TASK_CLAUSE_MASK): Add OMP_CLAUSE_DEPEND. (c_parser_omp_taskgroup): New function. (OMP_CANCEL_CLAUSE_MASK, OMP_CANCELLATION_POINT_CLAUSE_MASK): Define. (c_parser_omp_cancel, c_parser_omp_cancellation_point): New functions. (c_parser_omp_construct): Handle PRAGMA_OMP_SIMD and PRAGMA_OMP_TASKGROUP. (c_parser_transaction_cancel): Formatting fix. * c-tree.h (c_begin_omp_taskgroup, c_finish_omp_taskgroup, c_finish_omp_cancel, c_finish_omp_cancellation_point): New prototypes. * c-typeck.c (c_begin_omp_taskgroup, c_finish_omp_taskgroup, c_finish_omp_cancel, c_finish_omp_cancellation_point): New functions. (c_finish_omp_clauses): Handle new OpenMP 4.0 clauses. cp/ * parser.c (cp_parser_omp_clause_name): Add missing break after case 'i'. (cp_parser_omp_cancellation_point): Diagnose error if #pragma omp cancellation isn't followed by point. * semantics.c (finish_omp_clauses): Complain also about zero in alignment of aligned directive or safelen/simdlen expressions. (finish_omp_cancel): Fix up diagnostics wording. testsuite/ * c-c++-common/gomp/simd1.c: Enable also for C. * c-c++-common/gomp/simd2.c: Likewise. * c-c++-common/gomp/simd3.c: Likewise. * c-c++-common/gomp/simd4.c: Likewise. Adjust expected diagnostics for C. * c-c++-common/gomp/simd5.c: Enable also for C. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198264 138bc75d-0d04-0410-961f-82ee72b054a4
OpenMP constructs nested inside simd region. Don't treat #pragma omp simd as work-sharing region. Disallow work-sharing constructs inside of critical region. Complain if ordered region is nested inside of parallel region without loop region in between. (scan_omp_1_stmt): Call check_omp_nesting_restrictions even for GOMP_{cancel{,lation_point},taskyield,taskwait} calls. * gfortran.dg/gomp/appendix-a/a.35.5.f90: Add dg-error. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198459 138bc75d-0d04-0410-961f-82ee72b054a4
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198460 138bc75d-0d04-0410-961f-82ee72b054a4
dump_gimple_omp_atomic_store): Handle gimple_omp_atomic_seq_cst_p. * gimple.h (enum gf_mask): Add GF_OMP_ATOMIC_SEQ_CST. (gimple_omp_atomic_set_seq_cst, gimple_omp_atomic_seq_cst_p): New inline functions. * omp-low.c (expand_omp_atomic_load, expand_omp_atomic_store, expand_omp_atomic_fetch_op): If gimple_omp_atomic_seq_cst_p, pass MEMMODEL_SEQ_CST instead of MEMMODEL_RELAXED to the builtin. * gimplify.c (gimplify_omp_atomic): Handle OMP_ATOMIC_SEQ_CST. * tree-pretty-print.c (dump_generic_node): Handle OMP_ATOMIC_SEQ_CST. * tree.def (OMP_ATOMIC): Add comment that OMP_ATOMIC* must stay consecutive. * tree.h (OMP_ATOMIC_SEQ_CST): Define. c/ * c-parser.c (c_parser_omp_atomic): Parse seq_cst clause, pass true if it is present to c_finish_omp_atomic. cp/ * pt.c (tsubst_expr): Pass OMP_ATOMIC_SEQ_CST to finish_omp_atomic. * semantics.c (finish_omp_atomic): Add seq_cst argument, pass it through to c_finish_omp_atomic or store into OMP_ATOMIC_SEQ_CST. * cp-tree.h (finish_omp_atomic): Adjust prototype. * parser.c (cp_parser_omp_atomic): Parse seq_cst clause, pass true if it is present to finish_omp_atomic. c-family/ * c-omp.c (c_finish_omp_atomic): Add seq_cst argument, store it into OMP_ATOMIC_SEQ_CST bit. * c-common.h (c_finish_omp_atomic): Adjust prototype. testsuite/ * testsuite/libgomp.c/atomic-17.c: New test. * testsuite/libgomp.c++/atomic-14.C: New test. * testsuite/libgomp.c++/atomic-15.C: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198461 138bc75d-0d04-0410-961f-82ee72b054a4
Remove deprecated vectorlength clause features. Remove deprecated assert and noassert clauses. Implement vectorlength clause in OpenMP safelen terms.
(attribute_value_equal): Call it for -fopenmp if TREE_VALUE of the attributes are both OMP_CLAUSEs. * tree.h (omp_declare_simd_clauses_equal): Declare. c-family/ * c-common.c (c_common_attribute_table): Add "omp declare simd" attribute. (handle_omp_declare_simd_attribute): New function. * c-common.h (c_omp_declare_simd_clauses_to_numbers, c_omp_declare_simd_clauses_to_decls): Declare. * c-omp.c (c_omp_declare_simd_clause_cmp, c_omp_declare_simd_clauses_to_numbers, c_omp_declare_simd_clauses_to_decls): New functions. cp/ * cp-tree.h (cp_decl_specifier_seq): Add omp_declare_simd_clauses field. (finish_omp_declare_simd): Declare. * decl2.c (is_late_template_attribute): Return true for "omp declare simd" attribute. (cp_check_const_attributes): Don't check TREE_VALUE of arg if arg isn't a TREE_LIST. * decl.c (grokfndecl): Add omp_declare_simd_clauses argument, call finish_omp_declare_simd if non-NULL. (grokdeclarator): Pass it declspecs->omp_declare_simd_clauses to grokfndecl. * pt.c (apply_late_template_attributes): Handle "omp declare simd" attribute specially. (tsubst_omp_clauses): Add declare_simd argument, don't call finish_omp_clauses if it is set. Handle OpenMP 4.0 clauses. (tsubst_expr): Adjust tsubst_omp_clauses callers. * semantics.c (finish_omp_clauses): Diagnose inbranch notinbranch. (finish_omp_declare_simd): New function. * parser.h (struct cp_parser): Add omp_declare_simd_clauses field. * parser.c (cp_ensure_no_omp_declare_simd, cp_finish_omp_declare_simd): New functions. (enum pragma_context): Add pragma_member and pragma_objc_icode. (cp_parser_linkage_specification, cp_parser_namespace_definition, cp_parser_class_specifier_1): Call cp_ensure_no_omp_declare_simd. (cp_parser_init_declarator, cp_parser_member_declaration, cp_parser_function_definition_from_specifiers_and_declarator, cp_parser_save_member_function_body): Copy parser->omp_declare_simd_clauses to decl_specifiers->omp_declare_simd_clauses, call cp_finish_omp_declare_simd. (cp_parser_member_specification_opt): Pass pragma_member instead of pragma_external to cp_parser_pragma. (cp_parser_objc_interstitial_code): Pass pragma_objc_icode instead of pragma_external to cp_parser_pragma. (cp_parser_omp_var_list_no_open): If parser->omp_declare_simd_clauses, just cp_parser_identifier the argument names. (cp_parser_omp_all_clauses): Don't call finish_omp_clauses for parser->omp_declare_simd_clauses. (OMP_DECLARE_SIMD_CLAUSE_MASK): Define. (cp_parser_omp_declare_simd, cp_parser_omp_declare): New functions. (cp_parser_pragma): Call cp_ensure_no_omp_declare_simd. Handle PRAGMA_OMP_DECLARE_REDUCTION. Replace == pragma_external with != pragma_stmt and != pragma_compound. testsuite/ * g++.dg/gomp/declare-simd-1.C: New test. * g++.dg/gomp/declare-simd-2.C: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198739 138bc75d-0d04-0410-961f-82ee72b054a4
* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_LINEAR_STEP adjustments for pointer-types here. Diagnose inbranch notinbranch being used together. (c_finish_omp_declare_simd): New function. * c-parser.c (enum pragma_context): Add pragma_struct and pragma_param. (c_parser_declaration_or_fndef): Add omp_declare_simd_clauses argument. Call c_finish_omp_declare_simd if needed. (c_parser_external_declaration, c_parser_compound_statement_nostart, c_parser_label, c_parser_for_statement, c_parser_objc_methodprotolist, c_parser_omp_for_loop): Adjust c_parser_declaration_or_fndef callers. (c_parser_struct_or_union_specifier): Use pragma_struct instead of pragma_external. (c_parser_parameter_declaration): Use pragma_param instead of pragma_external. (c_parser_pragma): Handle PRAGMA_OMP_DECLARE_REDUCTION. Replace == pragma_external with != pragma_stmt && != pragma_compound test. (c_parser_omp_variable_list): Add declare_simd argument. Don't lookup vars if it is true, just store identifiers. (c_parser_omp_var_list_parens, c_parser_omp_clause_depend, c_parser_omp_clause_map): Adjust callers. (c_parser_omp_clause_reduction, c_parser_omp_clause_aligned): Add declare_simd argument, pass it through to c_parser_omp_variable_list. (c_parser_omp_clause_linear): Likewise. Don't handle OMP_CLAUSE_LINEAR_STEP adjustements for pointer-types here. (c_parser_omp_clause_uniform): Call c_parser_omp_variable_list instead of c_parser_omp_var_list_parens to pass true as declare_simd. (c_parser_omp_all_clauses): Add declare_simd argument, pass it through clause parsing routines as needed. Don't call c_finish_omp_clauses if set. (c_parser_omp_simd, c_parser_omp_for, c_parser_omp_sections, c_parser_omp_parallel, c_parser_omp_single, c_parser_omp_task, c_parser_omp_cancel, c_parser_omp_cancellation_point): Adjust callers. (OMP_DECLARE_SIMD_CLAUSE_MASK): Define. (c_parser_omp_declare_simd, c_parser_omp_declare): New functions. * gcc.dg/gomp/declare-simd-1.c: New test. * gcc.dg/gomp/declare-simd-2.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198828 138bc75d-0d04-0410-961f-82ee72b054a4
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@198835 138bc75d-0d04-0410-961f-82ee72b054a4
kraj
pushed a commit
to kraj/gcc
that referenced
this pull request
May 10, 2022
…04617] On #define A(n) int foo1##n(void) { return 1##n; } #define B(n) A(n##0) A(n##1) A(n#gcc-mirror#2) A(n#gcc-mirror#3) A(n#gcc-mirror#4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9) #define C(n) B(n##0) B(n##1) B(n#gcc-mirror#2) B(n#gcc-mirror#3) B(n#gcc-mirror#4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9) #define D(n) C(n##0) C(n##1) C(n#gcc-mirror#2) C(n#gcc-mirror#3) C(n#gcc-mirror#4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9) #define E(n) D(n##0) D(n##1) D(n#gcc-mirror#2) D(n#gcc-mirror#3) D(n#gcc-mirror#4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9) E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325) B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642) testcase with ./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c -ffunction-sections ./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o /tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized (testcase too slow to be included into testsuite). The problem is clearly reported by readelf: readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323 readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section. because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be used instead and .symtab_shndx section should contain the real section index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >= SHN_LORESERVE value is needed it should put those into Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can contain any section index. Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before 2011) used to mishandle the > 63.75K sections case and assumed there is a hole in between the sections, but what simple_object_elf_copy_lto_debug_sections does wouldn't help in that case for the debug temp object creation, we'd need to detect the case also in that routine and take it into account in the remapping etc. I think it is not worth it given that it is over 10 years, if somebody needs 63.75K or more sections, better use more recent binutils. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR lto/104617 * simple-object-elf.c (simple_object_elf_match): Fix up URL in comment. (simple_object_elf_copy_lto_debug_sections): Remap sh_info and sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE range (inclusive). (cherry picked from commit 2f59f06)
kraj
pushed a commit
to kraj/gcc
that referenced
this pull request
May 11, 2022
…04617] On #define A(n) int foo1##n(void) { return 1##n; } #define B(n) A(n##0) A(n##1) A(n#gcc-mirror#2) A(n#gcc-mirror#3) A(n#gcc-mirror#4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9) #define C(n) B(n##0) B(n##1) B(n#gcc-mirror#2) B(n#gcc-mirror#3) B(n#gcc-mirror#4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9) #define D(n) C(n##0) C(n##1) C(n#gcc-mirror#2) C(n#gcc-mirror#3) C(n#gcc-mirror#4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9) #define E(n) D(n##0) D(n##1) D(n#gcc-mirror#2) D(n#gcc-mirror#3) D(n#gcc-mirror#4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9) E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325) B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642) testcase with ./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c -ffunction-sections ./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o /tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized (testcase too slow to be included into testsuite). The problem is clearly reported by readelf: readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323 readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section. because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be used instead and .symtab_shndx section should contain the real section index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >= SHN_LORESERVE value is needed it should put those into Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can contain any section index. Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before 2011) used to mishandle the > 63.75K sections case and assumed there is a hole in between the sections, but what simple_object_elf_copy_lto_debug_sections does wouldn't help in that case for the debug temp object creation, we'd need to detect the case also in that routine and take it into account in the remapping etc. I think it is not worth it given that it is over 10 years, if somebody needs 63.75K or more sections, better use more recent binutils. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR lto/104617 * simple-object-elf.c (simple_object_elf_match): Fix up URL in comment. (simple_object_elf_copy_lto_debug_sections): Remap sh_info and sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE range (inclusive). (cherry picked from commit 2f59f06)
xionghul
pushed a commit
to xionghul/gcc
that referenced
this pull request
Dec 23, 2022
With many thanks to H.J. for doing all the hard work, this patch resolves two P1 regressions; PR target/106933 and PR target/106959. Although superficially similar, the i386 backend's two scalar-to-vector (STV) passes perform their transformations in importantly different ways. The original pass converting SImode and DImode operations to V4SImode or V2DImode operations is "soft", allowing values to be maintained in both integer and vector hard registers. The newer pass converting TImode operations to V1TImode is "hard" (all or nothing) that converts all uses of a pseudo to vector form. To implement this it invokes powerful ju-ju calling SET_MODE on a reg_rtx, which due to RTL sharing, often updates this pseudo's mode everywhere in the RTL chain. Hence, TImode STV can only be performed when all uses of a pseudo are convertible to V1TImode form. To ensure this the STV passes currently use data-flow analysis to inspect all DEFs and USEs in a chain. This works fine for chains that are in the usual single assignment form, but the occurrence of uninitialized variables, or multiple assignments that split a pseudo's usage into several independent chains (lifetimes) can lead to situations where some but not all of a pseudo's occurrences need to be updated. This is safe for the SImode/DImode pass, but leads to the above bugs during the TImode pass. My one minor tweak to HJ's patch from comment gcc-mirror#4 of bugzilla PR106959 is to only perform the new single_def_chain_p check for TImode STV; it turns out that STV of SImode/DImode min/max operates safely on multiple-def chains, and prohibiting this leads to testsuite regressions. We don't (yet) support V1TImode min/max, so this idiom isn't an issue during the TImode STV pass. For the record, the two alternate possible fixes are (i) make the TImode STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx with a new pseudo, or (ii) merging "chains" so that multiple DFA chains/lifetimes are considered a single STV chain. 2022-12-23 H.J. Lu <hjl.tools@gmail.com> Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/106933 PR target/106959 * config/i386/i386-features.cc (single_def_chain_p): New predicate function to check that a pseudo's use-def chain is in SSA form. (timode_scalar_to_vector_candidate_p): Check that TImode regs that are SET_DEST or SET_SRC of an insn match/are single_def_chain_p. gcc/testsuite/ChangeLog PR target/106933 PR target/106959 * gcc.target/i386/pr106933-1.c: New test case. * gcc.target/i386/pr106933-2.c: Likewise. * gcc.target/i386/pr106959-1.c: Likewise. * gcc.target/i386/pr106959-2.c: Likewise. * gcc.target/i386/pr106959-3.c: Likewise.
xionghul
pushed a commit
to xionghul/gcc
that referenced
this pull request
Jan 28, 2023
The aarch64 ISA specification allows a left shift amount to be applied after extension in the range of 0 to 4 (encoded in the imm3 field). This is true for at least the following instructions: * ADD (extend register) * ADDS (extended register) * SUB (extended register) The result of this patch can be seen, when compiling the following code: uint64_t myadd(uint64_t a, uint64_t b) { return a+(((uint8_t)b)<<4); } Without the patch the following sequence will be generated: 0000000000000000 <myadd>: 0: d37c1c21 ubfiz x1, x1, gcc-mirror#4, gcc-mirror#8 4: 8b000020 add x0, x1, x0 8: d65f03c0 ret With the patch the ubfiz will be merged into the add instruction: 0000000000000000 <myadd>: 0: 8b211000 add x0, x0, w1, uxtb gcc-mirror#4 4: d65f03c0 ret gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_uxt_size): fix an off-by-one in checking the permissible shift-amount.
vathpela
pushed a commit
to vathpela/gcc
that referenced
this pull request
Apr 29, 2023
This patch adds support for xstormy16's swap nibbles instruction (swpn). For the test case: short foo(short x) { return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f); } GCC with -O2 currently generates the nine instruction sequence: foo: mov r7,r2 asr r2,gcc-mirror#4 and r2,gcc-mirror#15 mov.w r6,#-256 and r6,r7 or r2,r6 shl r7,gcc-mirror#4 and r7,#255 or r2,r7 ret with this patch, we now generate: foo: swpn r2 ret To achieve this using combine's four instruction "combinations" requires a little wizardry. Firstly, define_insn_and_split are introduced to treat logical shifts followed by bitwise-AND as macro instructions that are split after reload. This is sufficient to recognize a QImode nibble swap, which can be implemented by swpn followed by either a zero-extension or a sign-extension from QImode to HImode. Then finally, in the correct context, a QImode swap-nibbles pattern can be combined to preserve the high-byte of a HImode word, matching the xstormy16's swpn semantics. The naming of the new code iterators is taken from i386.md. 2023-04-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.md (any_lshift): New code iterator. (any_or_plus): Likewise. (any_rotate): Likewise. (*<any_lshift>_and_internal): New define_insn_and_split to recognize a logical shift followed by an AND, and split it again after reload. (*swpn): New define_insn matching xstormy16's swpn. (*swpn_zext): New define_insn recognizing swpn followed by zero_extendqihi2, i.e. with the high byte set to zero. (*swpn_sext): Likewise, for swpn followed by cbw. (*swpn_sext_2): Likewise, for an alternate RTL form. (*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior sequence is split in the correct place to recognize the *swpn_zext followed by any_or_plus (ior, xor or plus) instruction. gcc/testsuite/ChangeLog * gcc.target/xstormy16/swpn-1.c: New QImode test case. * gcc.target/xstormy16/swpn-2.c: New zero_extend test case. * gcc.target/xstormy16/swpn-3.c: New sign_extend test case. * gcc.target/xstormy16/swpn-4.c: New HImode test case.
kraj
pushed a commit
to kraj/gcc
that referenced
this pull request
May 2, 2023
I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X##1; \ template <class U> struct X#gcc-mirror#2; \ template <class U> struct X#gcc-mirror#3; \ template <class U> struct X#gcc-mirror#4; \ template <class U> struct X#gcc-mirror#5; \ template <class U> struct X#gcc-mirror#6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
rurban
pushed a commit
to rurban/gcc
that referenced
this pull request
Oct 26, 2023
This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
XYenChi
referenced
this pull request
in XYenChi/gcc
Nov 7, 2023
This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A #4, R12 { RRAM.A #4, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
hubot
pushed a commit
that referenced
this pull request
Feb 16, 2024
Here we have template<class T> auto is_throwable(T t) -> decltype(throw t, true) { ... } where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused the wrong overload to have been chosen. Jason figured out it's because we don't correctly implement [expr.prim.id.unqual]#4.2, which post-P2266 says that an id-expression is move-eligible if "the id-expression (possibly parenthesized) is the operand of a throw-expression, and names an implicitly movable entity that belongs to a scope that does not contain the compound-statement of the innermost lambda-expression, try-block, or function-try-block (if any) whose compound-statement or ctor-initializer contains the throw-expression." I worked out that it's trying to say that given struct X { X(); X(const X&); X(X&&) = delete; }; the following should fail: the scope of the throw is an sk_try, and it's also x's scope S, and S "does not contain the compound-statement of the *try-block" so x is move-eligible, so we move, so we fail. void f () try { X x; throw x; // use of deleted function } catch (...) { } Whereas here: void g (X x) try { throw x; } catch (...) { } the throw is again in an sk_try, but x's scope is an sk_function_parms which *does* contain the {} of the *try-block, so x is not move-eligible, so we don't move, so we use X(const X&), and the code is fine. The current code also doesn't seem to handle void h (X x) { void z (decltype(throw x, true)); } where there's no enclosing lambda or sk_try so we should move. I'm not doing anything about lambdas because we shouldn't reach the code at the end of the function: the DECL_HAS_VALUE_EXPR_P check shouldn't let us go further. PR c++/113789 PR c++/113853 gcc/cp/ChangeLog: * typeck.cc (treat_lvalue_as_rvalue_p): Update code to better reflect [expr.prim.id.unqual]#4.2. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/sfinae69.C: Remove dg-bogus. * g++.dg/cpp0x/sfinae70.C: New test. * g++.dg/cpp0x/sfinae71.C: New test. * g++.dg/cpp0x/sfinae72.C: New test. * g++.dg/cpp2a/implicit-move4.C: New test.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 19, 2024
I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X#gcc-mirror#1; \ template <class U> struct X#gcc-mirror#2; \ template <class U> struct X#gcc-mirror#3; \ template <class U> struct X#gcc-mirror#4; \ template <class U> struct X#gcc-mirror#5; \ template <class U> struct X#gcc-mirror#6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 19, 2024
This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 19, 2024
Here we have template<class T> auto is_throwable(T t) -> decltype(throw t, true) { ... } where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused the wrong overload to have been chosen. Jason figured out it's because we don't correctly implement [expr.prim.id.unqual]gcc-mirror#4.2, which post-P2266 says that an id-expression is move-eligible if "the id-expression (possibly parenthesized) is the operand of a throw-expression, and names an implicitly movable entity that belongs to a scope that does not contain the compound-statement of the innermost lambda-expression, try-block, or function-try-block (if any) whose compound-statement or ctor-initializer contains the throw-expression." I worked out that it's trying to say that given struct X { X(); X(const X&); X(X&&) = delete; }; the following should fail: the scope of the throw is an sk_try, and it's also x's scope S, and S "does not contain the compound-statement of the *try-block" so x is move-eligible, so we move, so we fail. void f () try { X x; throw x; // use of deleted function } catch (...) { } Whereas here: void g (X x) try { throw x; } catch (...) { } the throw is again in an sk_try, but x's scope is an sk_function_parms which *does* contain the {} of the *try-block, so x is not move-eligible, so we don't move, so we use X(const X&), and the code is fine. The current code also doesn't seem to handle void h (X x) { void z (decltype(throw x, true)); } where there's no enclosing lambda or sk_try so we should move. I'm not doing anything about lambdas because we shouldn't reach the code at the end of the function: the DECL_HAS_VALUE_EXPR_P check shouldn't let us go further. PR c++/113789 PR c++/113853 gcc/cp/ChangeLog: * typeck.cc (treat_lvalue_as_rvalue_p): Update code to better reflect [expr.prim.id.unqual]gcc-mirror#4.2. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/sfinae69.C: Remove dg-bogus. * g++.dg/cpp0x/sfinae70.C: New test. * g++.dg/cpp0x/sfinae71.C: New test. * g++.dg/cpp0x/sfinae72.C: New test. * g++.dg/cpp2a/implicit-move4.C: New test.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 21, 2024
This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 25, 2024
This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
Liaoshihua
pushed a commit
to Liaoshihua/gcc
that referenced
this pull request
Mar 25, 2024
This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A gcc-mirror#4, R12 { RRAM.A gcc-mirror#4, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A gcc-mirror#1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
NinaRanns
referenced
this pull request
in NinaRanns/gcc
May 30, 2024
fixing tests and removing C++20 requirement
hubot
pushed a commit
that referenced
this pull request
Jun 13, 2024
Here during overload resolution we have two strictly viable ambiguous candidates #1 and #2, and two non-strictly viable candidates #3 and #4 which we hold on to ever since r14-6522. These latter candidates have an empty second arg conversion since the first arg conversion was deemed bad, and this trips up joust when called on #3 and #4 which assumes all arg conversions are there. We can fix this by making joust robust to empty arg conversions, but in this situation we shouldn't need to compare #3 and #4 at all given that we have a strictly viable candidate. To that end, this patch makes tourney shortcut considering non-strictly viable candidates upon encountering ambiguity between two strictly viable candidates (taking advantage of the fact that the candidates list is sorted according to viability via splice_viable). PR c++/115239 gcc/cp/ChangeLog: * call.cc (tourney): Don't consider a non-strictly viable candidate as the champ if there was ambiguity between two strictly viable candidates. gcc/testsuite/ChangeLog: * g++.dg/overload/error7.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
hubot
pushed a commit
that referenced
this pull request
Jun 17, 2024
Here during overload resolution we have two strictly viable ambiguous candidates #1 and #2, and two non-strictly viable candidates #3 and #4 which we hold on to ever since r14-6522. These latter candidates have an empty second arg conversion since the first arg conversion was deemed bad, and this trips up joust when called on #3 and #4 which assumes all arg conversions are there. We can fix this by making joust robust to empty arg conversions, but in this situation we shouldn't need to compare #3 and #4 at all given that we have a strictly viable candidate. To that end, this patch makes tourney shortcut considering non-strictly viable candidates upon encountering ambiguity between two strictly viable candidates (taking advantage of the fact that the candidates list is sorted according to viability via splice_viable). PR c++/115239 gcc/cp/ChangeLog: * call.cc (tourney): Don't consider a non-strictly viable candidate as the champ if there was ambiguity between two strictly viable candidates. gcc/testsuite/ChangeLog: * g++.dg/overload/error7.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com> (cherry picked from commit 7fed7e9)
hubot
pushed a commit
that referenced
this pull request
Jul 19, 2024
These tests used to generate: bl swap ldr r2, [sp, #4] mov r0, r2 @ __fp16 but g:9d20529d94b23275885f380d155fe8671ab5353a means that we can load directly into r0: bl swap ldrh r0, [sp, #4] @ __fp16 This patch updates the tests to "defend" this change. While there, the scans include: mov\tr1, r[03]} But if the spill of r2 occurs first, there's no real reason why r2 couldn't be used as the temporary, instead r3. The patch tries to update the scans while preserving the spirit of the originals. gcc/testsuite/ * gcc.target/arm/fp16-aapcs-2.c: Expect the return value to be loaded directly from the stack. Test that the swap generates two moves out of r0/r1 and two moves in. * gcc.target/arm/fp16-aapcs-4.c: Likewise.
hubot
pushed a commit
that referenced
this pull request
Sep 7, 2024
…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case.
hubot
pushed a commit
that referenced
this pull request
Sep 12, 2024
…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case. (cherry picked from commit bb8dd09)
hubot
pushed a commit
that referenced
this pull request
Sep 13, 2024
…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case. (cherry picked from commit bb8dd09)
hubot
pushed a commit
that referenced
this pull request
Oct 9, 2024
Whenever C1 and C2 are integer constants, X is of a wrapping type, and cmp is a relational operator, the expression X +- C1 cmp C2 can be simplified in the following cases: (a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial comparison in the following way: X +- C1 <= C2 -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1) -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions) -INF -+ C1 <= X <= +INF (due to (1)) -INF -+ C1 <= X (eliminate the right hand side since it holds for any X) (b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following sequence of transformations: X +- C1 >= C2 +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1) +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions) +INF -+ C1 >= X >= -INF (due to (1)) +INF -+ C1 >= X (eliminate the right hand side since it holds for any X) (c) The > and < cases are negations of (a) and (b), respectively. This transformation allows to occasionally save add / sub instructions, for instance the expression 3 + (uint32_t)f() < 2 compiles to cmn w0, #4 cset w0, ls instead of add w0, w0, 3 cmp w0, 2 cset w0, ls on aarch64. Testcases that go together with this patch have been split into two separate files, one containing testcases for unsigned variables and the other for wrapping signed ones (and thus compiled with -fwrapv). Additionally, one aarch64 test has been adjusted since the patch has caused the generated code to change from cmn w0, #2 csinc w0, w1, wzr, cc (x < -2) to cmn w0, #3 csinc w0, w1, wzr, cs (x <= -3) This patch has been bootstrapped and regtested on aarch64, x86_64, and i386, and additionally regtested on riscv32. gcc/ChangeLog: PR tree-optimization/116024 * match.pd: New transformation around integer comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr116024-2.c: New test. * gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto. * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.
hubot
pushed a commit
that referenced
this pull request
Nov 8, 2024
Update test case for armv8.1-m.main that supports conditional arithmetic. armv7-m: push {r4, lr} ldr r4, .L6 ldr r4, [r4] lsls r4, r4, #29 it mi addmi r2, r2, #1 bl bar movs r0, #0 pop {r4, pc} armv8.1-m.main: push {r3, r4, r5, lr} ldr r4, .L5 ldr r5, [r4] tst r5, #4 csinc r2, r2, r2, eq bl bar movs r0, #0 pop {r3, r4, r5, pc} gcc/testsuite/ChangeLog: * gcc.target/arm/epilog-1.c: Use check-function-bodies. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
hubot
pushed a commit
that referenced
this pull request
Nov 8, 2024
Update test case for armv8.1-m.main that supports conditional arithmetic. armv7-m: push {r4, lr} ldr r4, .L6 ldr r4, [r4] lsls r4, r4, #29 it mi addmi r2, r2, #1 bl bar movs r0, #0 pop {r4, pc} armv8.1-m.main: push {r3, r4, r5, lr} ldr r4, .L5 ldr r5, [r4] tst r5, #4 csinc r2, r2, r2, eq bl bar movs r0, #0 pop {r3, r4, r5, pc} gcc/testsuite/ChangeLog: * gcc.target/arm/epilog-1.c: Use check-function-bodies. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com> (cherry picked from commit ec86e87)
hubot
pushed a commit
that referenced
this pull request
Nov 26, 2024
In r14.2.0-376-g724446556e5, I accidentally introduced a regression in the expected assembler as the csinc instruction was not used for armv8.1-m.main. The generated assembler for armv8.1-m.main is: push {r3, r4, r5, lr} ldr r4, .L5 ldr r5, [r4] adds r4, r2, #1 tst r5, #4 it ne movne r2, r4 bl bar movs r0, #0 pop {r3, r4, r5, pc} gcc/testsuite/ChangeLog: * gcc.target/arm/epilog-1.c: Corrected armv8.1.m-main asm. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
hubot
pushed a commit
that referenced
this pull request
Feb 5, 2025
When generating thumb2 code, LDM SP!, {PC} is a two-byte instruction, whereas LDR PC, [SP], #4 is needs 4 bytes. When optimizing for size, or when there's no obvious performance benefit prefer the former. gcc/ChangeLog: PR target/118089 * config/arm/arm.cc (thumb2_expand_return): Use LDM SP!, {PC} when optimizing for size, or when there's no performance benefit over LDR PC, [SP], #4. (arm_expand_epilogue): Likewise.
hubot
pushed a commit
that referenced
this pull request
Feb 7, 2025
My earlier change for making the compiler prefer POP {PC} over LDR PC, [SP], #4 had a slightly unexpected consequence in that we now also call arm_emit_multi_reg_pop to handle single register pops when the register is not PC. This exposed a latent bug in this function where the dwarf unwinding notes on the single-register POP were not being set correctly. gcc/ PR target/118089 * config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust note to single-register POP instructions.
hubot
pushed a commit
that referenced
this pull request
Jun 13, 2025
…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case. (cherry picked from commit bb8dd09)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pulling for study purpose, no changes expected