Skip to content

Clarify semantics of urKernelSuggestMaxCooperativeGroupCountExp #1687

Open
@JackAKirk

Description

@JackAKirk

CC @0x12CC @nrspruit

In the discussion from here: #1246 (comment)

it was described that urKernelSuggestMaxCooperativeGroupCountExp maps to cudaOccupancyMaxActiveBlocksPerMultiprocessor
which takes a kernel and other params, and returns the maximum number of blocks that can be simultaneously executed in a streaming multiprocessor (SM).

However I found this in the l0 documentation:

"Use zeKernelSuggestMaxCooperativeGroupCount to recommend max group count for device for cooperative functions that device supports."

The "device" word implies that the semantics of of urKernelSuggestMaxCooperativeGroupCountExp is the maximum number of blocks that can be simultaneously executed in a device. A device consists of multiple streaming multiprocessors. In such a case you need to multiply the max number of blocks that can be simultanously executed in a SM by the number of SMs in a device.

The number of SMs can only be retrieved by querying the device the kernel is to be run on. This information (the device to be run on) is not passed to urKernelSuggestMaxCooperativeGroupCountExp, nor can it be inferred from any of the other parameters.
Therefore, there are two possibilities:

  • if the semantics is the max number of blocks per device, the interface needs to be changed.
  • if the semantics is the max number of blocks per SM, the documentation should be clarified IMO.

Metadata

Metadata

Assignees

No one assigned

    Labels

    cudaCUDA adapter specific issuesexperimentalExperimental feature additions/changes/specificationhipHIP adapter specific issueslevel-zeroL0 adapter specific issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions