-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROTON] Roctracer: convert agent id to gpu id for gpu ops #4090
Conversation
hsa::agentGetInfo<true>( | ||
agent, static_cast<hsa_agent_info_t>(HSA_AGENT_INFO_DEVICE), | ||
&deviceType); | ||
if ((nodeId < deviceOffset) && (deviceType == HSA_DEVICE_TYPE_GPU)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So HSA_GPU devices will have contiguous device offsets? In other words, the following scenario is not possible:
device ids: [2-4], gpu ids: [0-2]
device ids: [8-10], gpu ids: [2-4]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the current "observation". All cpus are enumerated before any gpus. I do not think that is written in a "contract" anywhere but is 100% consistent. This is not ideal, but the hip change, to include agent/node id in the device info, will remedy any issues there AND deal with the HIP_VISIBLE_DEVICES remapping.
…ng#4090) Roctracer reports (global) agent ids for the location of async ops, e.g. kernels and copies. The profiler would be better suited with gpu indexes (zero based). Created a mapping function to apply to values stored in KernelMetric::DeviceId. Caveat: if devices are hidden using HIP_VISIBLE_DEVICES then the hip device id, e.g. via hipGetDevice()/hipSetDevice(), will not match the reported unfiltered id. Additional support in hip will be needed to map through the filtering correctly. --------- Co-authored-by: Keren Zhou <robinho364@gmail.com> (cherry picked from commit 60613fb)
Cherry picks for release/3.0.x General: - e8bc45d [BACKEND][AMD] Disable linear layout due to perf regression (#4126) - 9a0a7c2 [AMD] Add basic verification to MFMA encoding (#4117) for RDNA: - 100e2aa [AMD][WMMA] Support dot3d (#3674) - 4a1ea8e [AMD][gfx11] Fix BF16 wmma instr generation (#4135) Proton HIP PRs: - 328b86d [PROTON] Refactor GPU profilers (#4056) - 60613fb [PROTON] Roctracer: convert agent id to gpu id for gpu ops (#4090) - c1776fa [PROTON][AMD] Add Proton HIP GPU Utilization Metrics (#4119) --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com> Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com> Co-authored-by: Ilya V <152324710+joviliast@users.noreply.github.com> Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: mwootton <michael.wootton@amd.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com>
…ng#4090) Roctracer reports (global) agent ids for the location of async ops, e.g. kernels and copies. The profiler would be better suited with gpu indexes (zero based). Created a mapping function to apply to values stored in KernelMetric::DeviceId. Caveat: if devices are hidden using HIP_VISIBLE_DEVICES then the hip device id, e.g. via hipGetDevice()/hipSetDevice(), will not match the reported unfiltered id. Additional support in hip will be needed to map through the filtering correctly. --------- Co-authored-by: Keren Zhou <robinho364@gmail.com>
Roctracer reports (global) agent ids for the location of async ops, e.g. kernels and copies.
The profiler would be better suited with gpu indexes (zero based).
Created a mapping function to apply to values stored in KernelMetric::DeviceId.
Caveat: if devices are hidden using HIP_VISIBLE_DEVICES then the hip device id, e.g. via hipGetDevice()/hipSetDevice(), will not match the reported unfiltered id. Additional support in hip will be needed to map through the filtering correctly.