Skip to content

AMDGPU misses optimization on check-all-workitem-ids are 0 pattern #136727

Open
@arsenm

Description

@arsenm

The device libraries include this pattern to check if all workitem IDs are 0.

// RUN: clang -target amdgcn-amd-amdhsa -S -O3 -mcpu=gfx900 -nogpulib < %s
bool
choose_one_workgroup_workitem(void)
{
    return (__builtin_amdgcn_workitem_id_x() | __builtin_amdgcn_workitem_id_y() | __builtin_amdgcn_workitem_id_z()) == 0;
}

https://github.com/ROCm/llvm-project/blob/662bae8d56ae5ba900a81b468936f47769b0fc2d/amd/device-libs/ockl/src/cg.cl#L46

This is equivalent to checking x == 0 && y == 0 && z == 0. If we codegen this, we see:

	v_and_b32_e32 v0, 0x3ff, v31
	v_bfe_u32 v1, v31, 20, 10
	v_bfe_u32 v2, v31, 10, 10
	v_or3_b32 v0, v0, v2, v1
	v_cmp_eq_u32_e32 vcc, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc
	s_setpc_b64 s[30:31]

In the function ABI, the work item IDs are packed into v31. We should be able to just check v31 == 0, so this would shrink to

	v_cmp_eq_u32_e32 vcc, 0, v31
	v_cndmask_b32_e64 v0, 0, 1, vcc
	s_setpc_b64 s[30:31]

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions