[webgpu] support intel subgroup matrix on matmul_nbits #24898

xhcao · 2025-05-29T08:01:12Z

The patch enables intel subgroup matrix on matmul_bits operator, and temporarily supports it on vulkan backend and xe-2lpg arch, we will extend the functions on more subgroup matrix configs and platforms.

Description

Motivation and Context

The patch enables intel subgroup matrix on matmul_bits operator, and temporarily supports it on vulkan backend and xe-2lpg arch, we will extend the functions on more subgroup matrix configs and platforms.

xhcao · 2025-05-29T08:05:58Z

The subgroup matrix feature is very relevant with hardware vendors and their architectures, they have different subgroup matrix configs, and different vendors and architectures have the different best subgroup config,
optimizing the algorithm on one hardware will easily hurt others hardwares, so we generate code separately for different vendors in the early stages of development.
The PR currently only supports intel xe-2lpg architecture on vulkan, and the subgroup matrix config is f16(816) x f16(1616) = f32(8*16), we will extend the features when the dawn enables more configs.
The current performance on intel xe-2lpg architecture is ~20% slower than dp4a, and ~10% faster than non-dp4a.

@jchen10 @daijh PTAL, thanks.

xhcao · 2025-05-29T08:23:20Z

Currently, the subgroup matrix config (UINT8(832) x UINT8(328) = UINT32(8*8)) is implementing in the dawn, which the expected result may be better than dp4a.

fs-eire · 2025-05-30T21:49:02Z

onnxruntime/core/providers/webgpu/webgpu_context.cc

@@ -150,6 +148,11 @@ void WebGpuContext::Initialize(const WebGpuBufferCacheConfig& buffer_cache_confi
    for (size_t i = 0; i < supported_features.featureCount; i++) {
      device_features_.insert(supported_features.features[i]);
    }
+    // cache adapter info
+    if (DeviceHasFeature(wgpu::FeatureName::ChromiumExperimentalSubgroupMatrix)) {


is this feature always available in all platforms? (win/linux/mac/wasm)

[webgpu] support intel subgroup matrix on matmul_nbits

fd1d56f

The patch enables intel subgroup matrix on matmul_bits operator, and temporarily supports it on vulkan backend and xe-2lpg arch, we will extend the functions on more subgroup matrix configs and platforms.

fs-eire reviewed May 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[webgpu] support intel subgroup matrix on matmul_nbits #24898

[webgpu] support intel subgroup matrix on matmul_nbits #24898

xhcao commented May 29, 2025

Uh oh!

xhcao commented May 29, 2025

Uh oh!

xhcao commented May 29, 2025

Uh oh!

fs-eire May 30, 2025

Uh oh!

Uh oh!

[webgpu] support intel subgroup matrix on matmul_nbits #24898

Are you sure you want to change the base?

[webgpu] support intel subgroup matrix on matmul_nbits #24898

Conversation

xhcao commented May 29, 2025

Description

Motivation and Context

Uh oh!

xhcao commented May 29, 2025

Uh oh!

xhcao commented May 29, 2025

Uh oh!

fs-eire May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!