Replies: 2 comments 2 replies
-
I only have an A770 so that's what I'm limited to for tests, but if someone has a B580 (or other Battlemage card) and wants to give it a shot, I can give instructions. They would have to build the project manually with slight code changes. |
Beta Was this translation helpful? Give feedback.
-
Hi @0cc4m. The performance regression issue you mentioned in #12690 with Arc A770 is acknowledged but we don't have an ETA for the fix yet. In the meantime, we were wondering if there's a good way to enable coopmat for other Arc GPUs, especially Xe2 (LunarLake, Battlemage). Following are benchmark results I've performed on b5200. There's a significant performance boost for Xe2 GPUs so would be great if we can target specific GPUs and enable it. I've only tested on Windows with llama-bench using gemma2 but what other testing is required for a PR? Benchmark results
Source code diff for testingdiff --git a/ggml/src/ggml-vulkan/ggml-vulkan.cpp b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
index 131ee1ea..e2326a48 100644
--- a/ggml/src/ggml-vulkan/ggml-vulkan.cpp
+++ b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
@@ -8744,7 +8744,7 @@ static bool ggml_vk_khr_cooperative_matrix_support(const vk::PhysicalDevicePrope
switch (props.vendorID) {
case VK_VENDOR_ID_INTEL:
// Intel drivers don't support coopmat properly yet
- return false;
+ return true;
case VK_VENDOR_ID_AMD:
if (driver_props.driverID == vk::DriverId::eAmdProprietary || driver_props.driverID == vk::DriverId::eAmdOpenSource) {
// Workaround for AMD proprietary driver reporting support on all GPUs |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
According to this issue: #12690
and this commit: b56f079
It looks like Intel GPUs do support using matrix cores in Vulkan inference, but it got disabled since actual performance is worse and not easily sorted out why.
However, that assumption is made using A-series cards but not latest B-series Xe2 cards. After a rough search, I also cannot find anyone done such experiments. Are there any performance numbers out there? Or are anyone willing to give it a try?
Beta Was this translation helpful? Give feedback.
All reactions