Skip to content

Adding Native Support of SYCL for Intel GPUs #4749

Closed
@airMeng

Description

@airMeng

Feature Description

Hi the community, following the discussion #3965, we plan to contribute native SYCL backend to llama.cpp.

Motivation

Intel Arc series GPU provides accountable VRAM size and bandwidth, which the current OpenCL backend can't fully utilize especially on LLM. We expect a significant performance improvement with native SYCL backend.

References:

Possible Implementation

Native Kernels

We will implement the key operators of GGML in SYCL similar to the approach of supporting Metal and Vulkan. Basically, the steps are described as below:

  1. new backend; h2d & d2h
  2. oneMKL-dpcpp based FP32 & FP16 GEMM
  3. native SYCL kernels for de-quantization
  4. native SYCL kernels for other operators

Note:

Since llama.cpp has been evolving rapidly and new features will probably be supported through CUDA first, we plan to enable SYCLomatic to help migrate the code from CUDA to SYCL.

We plan to further introduce the template-based library e.g., XeTLA as mentioned in #3965 as the next stage, while we will be focusing on native SYCL support in this proposal.

Summary

We started working on native SYCL kernels and enabling SYCL backend in llama.cpp for Intel GPUs. Please feel free to drop a note. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions