Skip to content

Native Intel IPEX-LLM Support #7190

Closed
Closed
@iamhumanipromise

Description

@iamhumanipromise

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [X ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [ X] I carefully followed the README.md.
  • [ X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

I have found this closed issue where someone manually (?how?) implemented IPEX-LLM. However, looking forward to native IPEX-LLM support for Intel Xe iGPUs + Intel Arc dGPUs on Windows and Linux

#7042

TL;DR is IPEX-LLM now provides a C++ interface, which can be used as a backend for running llama.cpp on Intel GPUs. Incorporating this interface into llama.cpp would allow for leveraging the optimized performance of IPEX-LLM.

Motivation

Intel Xe graphics launched in 2020. Flex, Max Datacenter and Arc Consumer cards for laptop and desktop launched in 2022. This is a lot of devices in production/circulation.

This would "permit" llama.cpp users to utilize their integrated Xe GPUs and dedicated Arc GPUs, Datacenter Flex and Max cards with llama.cpp on BOTH Windows and Linux natively (without a confusing manual build).

Possible Implementation

The implementation of native Intel IPEX-LLM support would be something like... Integrate --> Test --> Document --> Release.

  1. Integration with IPEX: Since IPEX-LLM is built on top of Intel Extension for PyTorch (IPEX), the first step would be to ensure seamless integration with IPEX. This would involve linking the llama.cpp build system with the IPEX library and ensuring that all dependencies are correctly managed. Here is a link for using llama.cpp with Intel GPUs...

Full manual/guide: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html
Full verified model list: https://ipex-llm.readthedocs.io/en/latest/#verified-models
Github: https://github.com/intel-analytics/ipex-llm

The "owners" of this process will be the devs and engineers here; in this Github (simple nerds such as myself do not have the expertise to tackle something like this... even locally)

For example from the documentation it looks like this would be create a new conda envioronment --> set up environment --> configure oneapi variables --> update cmakelists.txt or makefile with paths to IPEX-LLM library and headers --> then ??map llama.cpp functionalities to ipex apis (which Intel has already done).

  1. Testing Across Platforms: Ensuring that the implementation works across different versions of Windows and Linux is crucial... This includes testing on various Intel iGPUs and Arc dGPUs to guarantee broad compatibility. This effort would involve the community here, various Discords, subreddits, and perhaps trying to "rope in" as many laptop/desktop Xe iGPU users and dGPU users as possible -- so that means gamers, too.

The "owners" of this step would be wide-ranging overall.

  1. Documentation and Examples: Someone would have to "own" updating the documentation to guide users on how to enable and use the new IPEX-LLM support. Providing examples and quickstart guides can significantly help; but ultimately for independent users it will be up to them and then for GUI and TUI/CLI frontends, the documentation will need to be updated by them.

  2. Release After all of this has been done, going forward to launch woot woot.

I'm sure there are many, many steps I am missing here. Just wanted to "kick off" the process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions