Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flags to the core queue interface for device-side ring buf/queue descriptor allocation #284

Open
wants to merge 4 commits into
base: amd-staging
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
rocr: Add large_bar_enabled var to the GPU agent
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.

Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.

This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).
  • Loading branch information
atgutier committed Feb 19, 2025
commit 04b751aac24bc1608f44e2c34fa2546a499777a3
4 changes: 4 additions & 0 deletions runtime/hsa-runtime/core/inc/amd_gpu_agent.h
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,8 @@ class GpuAgent : public GpuAgentInt {

/// @brief Override from AMD::GpuAgentInt.
__forceinline bool is_xgmi_cpu_gpu() const { return xgmi_cpu_gpu_; }
/// @brief Is large BAR support enabled for this GPU.
__forceinline bool LargeBarEnabled() const { return large_bar_enabled_; }

const size_t MAX_SCRATCH_APERTURE_PER_XCC = (1ULL << 32);
size_t MaxScratchDevice() const { return properties_.NumXcc * MAX_SCRATCH_APERTURE_PER_XCC; }
Expand Down Expand Up @@ -808,6 +810,8 @@ class GpuAgent : public GpuAgentInt {

/// @brief XGMI CPU<->GPU
bool xgmi_cpu_gpu_ = false;
/// @brief Is PCIe large BAR enabled.
bool large_bar_enabled_ = false;
};

} // namespace amd
Expand Down
4 changes: 4 additions & 0 deletions runtime/hsa-runtime/core/runtime/amd_aql_queue.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -732,6 +732,10 @@ void AqlQueue::AllocRegisteredRingBuffer(uint32_t queue_size_pkts) {
assert(IsMultipleOf(ring_buf_alloc_bytes_, 4096) && "Ring buffer sizes must be 4KiB aligned.");

if (IsDeviceMemRingBuf()) {
if (!agent_->LargeBarEnabled()) {
throw AMD::hsa_exception(HSA_STATUS_ERROR_INVALID_QUEUE_CREATION,
"Trying to allocate an AQL ring buffer in device memory without large BAR PCIe enabled.");
}
ring_buf_ = agent_->coarsegrain_allocator()(
ring_buf_alloc_bytes_,
core::MemoryRegion::AllocateExecutable | core::MemoryRegion::AllocateUncached);
Expand Down
10 changes: 7 additions & 3 deletions runtime/hsa-runtime/core/runtime/amd_gpu_agent.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -223,9 +223,13 @@ GpuAgent::GpuAgent(HSAuint32 node, const HsaNodeProperties& node_props, bool xna
#endif

auto& firstCpu = core::Runtime::runtime_singleton_->cpu_agents()[0];
auto linkInfo = core::Runtime::runtime_singleton_->GetLinkInfo(firstCpu->node_id(),
node_id());
xgmi_cpu_gpu_ = (linkInfo.info.link_type == HSA_AMD_LINK_INFO_TYPE_XGMI);
auto link_info = core::Runtime::runtime_singleton_->GetLinkInfo(firstCpu->node_id(),
node_id());
xgmi_cpu_gpu_ = (link_info.info.link_type == HSA_AMD_LINK_INFO_TYPE_XGMI);

if (link_info.num_hop >= 1) {
large_bar_enabled_ = true;
}

// Populate region list.
InitRegionList();
Expand Down