Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are Vulkan and Kompute backend supported on AMD 7800 XT in Windows ? #5698

Closed
yoopyman opened this issue Feb 24, 2024 · 5 comments
Closed

Are Vulkan and Kompute backend supported on AMD 7800 XT in Windows ? #5698

yoopyman opened this issue Feb 24, 2024 · 5 comments

Comments

@yoopyman
Copy link

I have a question related to the two types backends: Vulkan and Kompute.
I am running on Windows 11 latest version with latest version of AMD drivers on a AMD 7800 XT graphic card.
I have tried with different windows instalations and it is the same. Also tried different drivers version with clean installation and it's the same.

I just want to know if this card is supported on windows with Kompute and Vulkan or it's only for linux.
Thanks

If I run the Vulkan version (b2251) I receive this error:

main: build = 2251 (fd43d66)
main: built with MSVC 19.38.33135.0 for x64
Starting Test
Allocating Memory of size 800194560 bytes, 763 MB
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: AMD Radeon RX 7800 XT | uma: 0 | fp16: 1 | warp size: 64
Creating new tensors

------ Test 1 - Matrix Mult via F32 code
n_threads=1
m11: type = 0 ( f32) ne = 11008 x 4096 x 1, nb = ( 4, 44032, 180355072) - Sum of tensor m11 is 45088768.00
m2: type = 0 ( f32) ne = 11008 x 128 x 1, nb = ( 4, 44032, 5636096) - Sum of tensor m2 is 2818048.00
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml-vulkan.cpp:1767: false

If I run the Kompute version (b2251) I don't receive and error, but doesn't seems to use the graphic card based on the low points:

main: build = 2251 (fd43d66)
main: built with MSVC 19.38.33135.0 for x64
Starting Test
Allocating Memory of size 800194560 bytes, 763 MB
Creating new tensors

_------ Test 1 - Matrix Mult via F32 code
n_threads=1
m11: type = 0 ( f32) ne = 11008 x 4096 x 1, nb = ( 4, 44032, 180355072) - Sum of tensor m11 is 45088768.00
m2: type = 0 ( f32) ne = 11008 x 128 x 1, nb = ( 4, 44032, 5636096) - Sum of tensor m2 is 2818048.00
gf->nodes[0]: type = 0 ( f32) ne = 4096 x 128 x 1, nb = ( 4, 16384, 2097152) - Sum of tensor gf->nodes[0] is 11542724608.00

------ Test 2 - Matrix Mult via q4_1 code
n_threads=1
Matrix Multiplication of (11008,4096,1) x (11008,128,1) - about 11.54 gFLOPS

Iteration;NThreads; SizeX; SizeY; SizeZ; Required_FLOPS; Elapsed_u_Seconds; gigaFLOPS

    0;       1; 11008;  4096;   128;    11542724608;            141751;     81.43
    1;       1; 11008;  4096;   128;    11542724608;            140819;     81.97
    2;       1; 11008;  4096;   128;    11542724608;            141098;     81.81
    3;       1; 11008;  4096;   128;    11542724608;            140593;     82.10
    4;       1; 11008;  4096;   128;    11542724608;            140639;     82.07
    5;       1; 11008;  4096;   128;    11542724608;            140766;     82.00
    6;       1; 11008;  4096;   128;    11542724608;            140835;     81.96
    7;       1; 11008;  4096;   128;    11542724608;            141091;     81.81
    8;       1; 11008;  4096;   128;    11542724608;            140719;     82.03
    9;       1; 11008;  4096;   128;    11542724608;            140794;     81.98

Average 81.92_

The my Vulkan info first part (it was to big to add it all) info looks like this:

WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.

VULKANINFO

Vulkan Instance Version: 1.3.261

Instance Extensions: count = 13

    VK_EXT_debug_report                    : extension revision 10
    VK_EXT_debug_utils                     : extension revision 2
    VK_EXT_swapchain_colorspace            : extension revision 4
    VK_KHR_device_group_creation           : extension revision 1
    VK_KHR_external_fence_capabilities     : extension revision 1
    VK_KHR_external_memory_capabilities    : extension revision 1
    VK_KHR_external_semaphore_capabilities : extension revision 1
    VK_KHR_get_physical_device_properties2 : extension revision 2
    VK_KHR_get_surface_capabilities2       : extension revision 1
    VK_KHR_portability_enumeration         : extension revision 1
    VK_KHR_surface                         : extension revision 25
    VK_KHR_win32_surface                   : extension revision 6
    VK_LUNARG_direct_driver_loading        : extension revision 1

Layers: count = 1

VK_LAYER_AMD_switchable_graphics (AMD switchable graphics layer) Vulkan version 1.3.277, layer version 1:
Layer Extensions: count = 0
Devices: count = 1
GPU id = 0 (AMD Radeon RX 7800 XT)
Layer-Device Extensions: count = 0

Presentable Surfaces:

GPU id : 0 (AMD Radeon RX 7800 XT):
Surface type = VK_KHR_win32_surface
Formats: count = 4
SurfaceFormat[0]:
format = FORMAT_R8G8B8A8_UNORM
colorSpace = COLOR_SPACE_SRGB_NONLINEAR_KHR
SurfaceFormat[1]:
format = FORMAT_B8G8R8A8_UNORM
colorSpace = COLOR_SPACE_SRGB_NONLINEAR_KHR
SurfaceFormat[2]:
format = FORMAT_R8G8B8A8_SRGB
colorSpace = COLOR_SPACE_SRGB_NONLINEAR_KHR
SurfaceFormat[3]:
format = FORMAT_B8G8R8A8_SRGB
colorSpace = COLOR_SPACE_SRGB_NONLINEAR_KHR
Present Modes: count = 3
PRESENT_MODE_IMMEDIATE_KHR
PRESENT_MODE_FIFO_KHR
PRESENT_MODE_FIFO_RELAXED_KHR
VkSurfaceCapabilitiesKHR:
-------------------------
minImageCount = 2
maxImageCount = 16
currentExtent:
width = 256
height = 256
minImageExtent:
width = 256
height = 256
maxImageExtent:
width = 256
height = 256
maxImageArrayLayers = 1
supportedTransforms: count = 1
SURFACE_TRANSFORM_IDENTITY_BIT_KHR
currentTransform = SURFACE_TRANSFORM_IDENTITY_BIT_KHR
supportedCompositeAlpha: count = 1
COMPOSITE_ALPHA_OPAQUE_BIT_KHR
supportedUsageFlags: count = 6
IMAGE_USAGE_TRANSFER_SRC_BIT
IMAGE_USAGE_TRANSFER_DST_BIT
IMAGE_USAGE_SAMPLED_BIT
IMAGE_USAGE_STORAGE_BIT
IMAGE_USAGE_COLOR_ATTACHMENT_BIT
IMAGE_USAGE_INPUT_ATTACHMENT_BIT
VkSurfaceCapabilitiesFullScreenExclusiveEXT:
--------------------------------------------
fullScreenExclusiveSupported = true

Device Properties and Extensions:

GPU0:
VkPhysicalDeviceProperties:

    apiVersion        = 1.3.277 (4206869)
    driverVersion     = 2.0.299 (8388907)
    vendorID          = 0x1002
    deviceID          = 0x747e
    deviceType        = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
    deviceName        = AMD Radeon RX 7800 XT
    pipelineCacheUUID = 342bec4f-5205-5a35-9265-2b80db05cfac

VkPhysicalDeviceLimits:

    maxImageDimension1D                             = 16384
    maxImageDimension2D                             = 16384
    maxImageDimension3D                             = 8192
    maxImageDimensionCube                           = 16384
    maxImageArrayLayers                             = 8192
    maxTexelBufferElements                          = 4294967295
    maxUniformBufferRange                           = 4294967295
    maxStorageBufferRange                           = 4294967295
    maxPushConstantsSize                            = 128
    maxMemoryAllocationCount                        = 4294967295
    maxSamplerAllocationCount                       = 1048576
    bufferImageGranularity                          = 0x00000001
    sparseAddressSpaceSize                          = 0x7ffa00000000
    maxBoundDescriptorSets                          = 32
    maxPerStageDescriptorSamplers                   = 4294967295
    maxPerStageDescriptorUniformBuffers             = 4294967295
    maxPerStageDescriptorStorageBuffers             = 4294967295
    maxPerStageDescriptorSampledImages              = 4294967295
    maxPerStageDescriptorStorageImages              = 4294967295
    maxPerStageDescriptorInputAttachments           = 4294967295
    maxPerStageResources                            = 4294967295
    maxDescriptorSetSamplers                        = 4294967295
    maxDescriptorSetUniformBuffers                  = 4294967295
    maxDescriptorSetUniformBuffersDynamic           = 8
    maxDescriptorSetStorageBuffers                  = 4294967295
    maxDescriptorSetStorageBuffersDynamic           = 8
    maxDescriptorSetSampledImages                   = 4294967295
    maxDescriptorSetStorageImages                   = 4294967295
    maxDescriptorSetInputAttachments                = 4294967295
    maxVertexInputAttributes                        = 64
    maxVertexInputBindings                          = 32
    maxVertexInputAttributeOffset                   = 4294967295
    maxVertexInputBindingStride                     = 16383
    maxVertexOutputComponents                       = 128
    maxTessellationGenerationLevel                  = 64
    maxTessellationPatchSize                        = 32
    maxTessellationControlPerVertexInputComponents  = 128
    maxTessellationControlPerVertexOutputComponents = 128
    maxTessellationControlPerPatchOutputComponents  = 120
    maxTessellationControlTotalOutputComponents     = 4096
    maxTessellationEvaluationInputComponents        = 128
    maxTessellationEvaluationOutputComponents       = 128
    maxGeometryShaderInvocations                    = 126
    maxGeometryInputComponents                      = 128
    maxGeometryOutputComponents                     = 128
    maxGeometryOutputVertices                       = 256
    maxGeometryTotalOutputComponents                = 1024
    maxFragmentInputComponents                      = 128
    maxFragmentOutputAttachments                    = 8
    maxFragmentDualSrcAttachments                   = 1
    maxFragmentCombinedOutputResources              = 4294967295
    maxComputeSharedMemorySize                      = 32768
    maxComputeWorkGroupCount: count = 3
            4294967295
            65535
            65535
    maxComputeWorkGroupInvocations                  = 1024
    maxComputeWorkGroupSize: count = 3
            1024
            1024
            1024
    subPixelPrecisionBits                           = 8
    subTexelPrecisionBits                           = 8
    mipmapPrecisionBits                             = 8
    maxDrawIndexedIndexValue                        = 4294967295
    maxDrawIndirectCount                            = 4294967295
    maxSamplerLodBias                               = 15.9961
    maxSamplerAnisotropy                            = 16
    maxViewports                                    = 16
    maxViewportDimensions: count = 2
            16384
            16384
    viewportBoundsRange: count = 2
            -32768
            32767
    viewportSubPixelBits                            = 8
    minMemoryMapAlignment                           = 64
    minTexelBufferOffsetAlignment                   = 0x00000004
    minUniformBufferOffsetAlignment                 = 0x00000010
    minStorageBufferOffsetAlignment                 = 0x00000004
    minTexelOffset                                  = -64
    maxTexelOffset                                  = 63
    minTexelGatherOffset                            = -32
    maxTexelGatherOffset                            = 31
    minInterpolationOffset                          = -2
    maxInterpolationOffset                          = 1
    subPixelInterpolationOffsetBits                 = 8
    maxFramebufferWidth                             = 16384
    maxFramebufferHeight                            = 16384
    maxFramebufferLayers                            = 8192
    framebufferColorSampleCounts: count = 4
            SAMPLE_COUNT_1_BIT
            SAMPLE_COUNT_2_BIT
            SAMPLE_COUNT_4_BIT
            SAMPLE_COUNT_8_BIT
    framebufferDepthSampleCounts: count = 4
            SAMPLE_COUNT_1_BIT
            SAMPLE_COUNT_2_BIT
            SAMPLE_COUNT_4_BIT
            SAMPLE_COUNT_8_BIT
    framebufferStencilSampleCounts: count = 4
            SAMPLE_COUNT_1_BIT
            SAMPLE_COUNT_2_BIT
            SAMPLE_COUNT_4_BIT
            SAMPLE_COUNT_8_BIT
    framebufferNoAttachmentsSampleCounts: count = 4
            SAMPLE_COUNT_1_BIT
            SAMPLE_COUNT_2_BIT
            SAMPLE_COUNT_4_BIT
            SAMPLE_COUNT_8_BIT
    maxColorAttachments                             = 8
    sampledImageColorSampleCounts: count = 4
            SAMPLE_COUNT_1_BIT
            SAMPLE_COUNT_2_BIT
            SAMPLE_COUNT_4_BIT
            SAMPLE_COUNT_8_BIT
    sampledImageIntegerSampleCounts: count = 4
            SAMPLE_COUNT_1_BIT
            SAMPLE_COUNT_2_BIT
            SAMPLE_COUNT_4_BIT
            SAMPLE_COUNT_8_BIT
    sampledImageDepthSampleCounts: count = 4
            SAMPLE_COUNT_1_BIT
            SAMPLE_COUNT_2_BIT
            SAMPLE_COUNT_4_BIT
            SAMPLE_COUNT_8_BIT
    sampledImageStencilSampleCounts: count = 4
            SAMPLE_COUNT_1_BIT
            SAMPLE_COUNT_2_BIT
            SAMPLE_COUNT_4_BIT
            SAMPLE_COUNT_8_BIT
    storageImageSampleCounts: count = 4
            SAMPLE_COUNT_1_BIT
            SAMPLE_COUNT_2_BIT
            SAMPLE_COUNT_4_BIT
            SAMPLE_COUNT_8_BIT
    maxSampleMaskWords                              = 1
    timestampComputeAndGraphics                     = true
    timestampPeriod                                 = 10
    maxClipDistances                                = 8
    maxCullDistances                                = 8
    maxCombinedClipAndCullDistances                 = 8
    discreteQueuePriorities                         = 2
    pointSizeRange: count = 2
            0
            8191.88
    lineWidthRange: count = 2
            0
            8191.88
    pointSizeGranularity                            = 0.125
    lineWidthGranularity                            = 0.125
    strictLines                                     = false
    standardSampleLocations                         = true
    optimalBufferCopyOffsetAlignment                = 0x00000001
    optimalBufferCopyRowPitchAlignment              = 0x00000001
    nonCoherentAtomSize                             = 0x00000080
@stduhpf
Copy link
Contributor

stduhpf commented Feb 24, 2024

What model architecture are you trying to run, and with what kind of quantization? Because these backends don't have full support of all lama.cpp features.

@yoopyman
Copy link
Author

I just run benchmark.exe and I suppose that if this it's not working correctly then running any model will not work on that type of backend.

@sorasoras
Copy link

I just run benchmark.exe and I suppose that if this it's not working correctly then running any model will not work on that type of backend.

it's support for model and quant is pretty limited.
it work for my 7900XTX but not all gguf

@stduhpf
Copy link
Contributor

stduhpf commented Feb 24, 2024

Vulkan should work the same regardless of your hardware vendor or operating system... The 7800XT is far from being an ancient card and should support pretty much all Vulkan features, especially on Windows.
In my experience, using the Vulkan backend I can get llama-2 based models at Q4_K quantization (or any other QX_0, QX_1, or QX_K quants) working perfectly fine on my RX 5700XT (which is older than your card). But I can't run IQ2 quantization or MoE models for example.
I haven't played around with Kompute a lot, but it is said in the original PR ( #4456) that only Q4_0, Q4_1, FP16, and FP32 quantizations (or lack thereof) are supported.

@yoopyman
Copy link
Author

My assumption was if the benchmark.exe doesn't run then also the models will not load.
My bad that I assumed that and I have tested right now to load a model with Vulkan and it seems to works.
Very strange that only benchmark.exe doesn't work.
The Kompute also work, but only on CPU and I suppose that also normal on Windows and 7800 XT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants