-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add SYCL Backend Support for Intel GPUs #330
Conversation
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
Do we have any reference performance data? like, what is the latency of A100 for each model. |
I only have A30 for now. Will update a table soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I happened to be working on this recently.
LGTM! The code change is basically identical with my local version :-P
FP16 performance is bad -- it's even worse than using fp32. We can address that later.
Test commands are from the text2img section of the PR description.
|
Can you please add units to the performance data? until stated, it can be both it/s and s/it, as well as total sec ... |
sorry, added. |
@leejet anything else needed to be merged? |
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
This PR is ready for review now. There still has an issue that sycl backend can not generate images with large |
@leejet, cloud you please take a look at this PR if you have free time? Thanks a lot. |
Excellent work! Thank you for your contribution. |
I tried to use my intel gpu with this, able to build successfully but when try to run I got the following error:
Is there anything I am missed? Here's my gpu info:
|
Hi, @cheeseng, does |
@zhentaoyu Sorry I just noticed this, here's my
|
Hi, @cheeseng, |
* update ggml and add SYCL CMake option Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * hacky CMakeLists.txt for updating ggml in cpu backend Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * rebase and clean code Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * add sycl in README Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * rebase ggml commit Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * refine README Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * update ggml for supporting sycl tsembd op Signed-off-by: zhentaoyu <zhentao.yu@intel.com> --------- Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* update ggml and add SYCL CMake option Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * hacky CMakeLists.txt for updating ggml in cpu backend Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * rebase and clean code Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * add sycl in README Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * rebase ggml commit Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * refine README Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * update ggml for supporting sycl tsembd op Signed-off-by: zhentaoyu <zhentao.yu@intel.com> --------- Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* update ggml and add SYCL CMake option Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * hacky CMakeLists.txt for updating ggml in cpu backend Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * rebase and clean code Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * add sycl in README Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * rebase ggml commit Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * refine README Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * update ggml for supporting sycl tsembd op Signed-off-by: zhentaoyu <zhentao.yu@intel.com> --------- Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
What Does This PR Do
#308
Hi, it's a pr for initial
SYCL
backend support for Intel GPUs (Arch-770, etc.). We are happy to hear the feedback from stable-diffusion.cpp community. If you have any comments or questions, please feel free to leave them in this pr.cc @airMeng, @luoyu-intel, @hshen14
TODO or Issues
ggml
commit whentsembd
sycl op is merged ([SYCL] AddTIMESTEP_EMBEDDING
OP ggerganov/llama.cpp#8707).-H
and-W
(for example, 1024, seems sycl issue):Provided range is out of integer limits. Pass '-fno-sycl-id-queries-fit-in-int' to disable range check.
It's caused by theim2col
sycl op sum_of_products ofglobal_range
exceeds INT_MAX. We need to modifyim2col
kernel or implement another more efficient conv op in the future.Test Results of SYCL
build command
cmake .. -DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
doc link is here
test machine
text2img
./build/bin/sd -m ../sd_models/sd-v1-4.ckpt -p "a lovely cat" -o "output_sycl_sd1-4.png"
output:
[INFO ] stable-diffusion.cpp:1328 - txt2img completed in 8.35s
./build/bin/sd -m ../sd_models/v1-5-pruned-emaonly.safetensors -p "a lovely cat" -o "output_sycl_sd1-5.png"
output:
[INFO ] stable-diffusion.cpp:1328 - txt2img completed in 8.44s
sd v2
./build/bin/sd -m ../sd_models/v2-1_768-nonema-pruned.safetensors -p "a lovely cat" -o "output_sycl_sd2.png"
output:
[INFO ] stable-diffusion.cpp:1328 - txt2img completed in 6.04s
sd xl
./build/bin/sd -m ../sd_models/sdxl/sd_xl_base_1.0.safetensors --vae ../sd_models/sdxl/sdxl_vae.safetensors -H 512 -W 512 -p "a lovely cat" -o "output_sycl_sdxl.png" --seed 16
output:
[INFO ] stable-diffusion.cpp:1328 - txt2img completed in 12.46s
sd v3
./build/bin/sd -m ../sd_models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 512 -W 512 -p "a lovely cat holding a sign says \"Stable Diffusion CPP\"" --cfg-scale 4.5 --sampling-method euler -v -o "output_sycl_sd3.png"
output:
[INFO ] stable-diffusion.cpp:1328 - txt2img completed in 13.25s
quantization type
command:
./build/bin/sd -m ../sd_models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 512 -W 512 -p "a lovely cat holding a sign says \"Stable Diffusion CPP\"" --cfg-scale 4.5 --sampling-method euler -v -o "output_sycl_sd3_q8_0.png" --type q8_0
img2img
command:
./build/bin/sd --mode img2img -m ../sd_models/sd3_medium_incl_clips_t5xxlfp16.safetensors -p "cat with blue eyes" -i ./assets/f32.png --strength 0.4 -o "output_sycl_img2img.png"
output:
LCM-Lora
command:
./build/bin/sd -m ../sd_models/v1-5-pruned-emaonly.safetensors -p "a lovely cat<lora:lcm-lora-sdv1-5:1>" --steps 4 --lora-model-dir ../sd_models/lora/ -v --cfg-scale 1 -o "output_sycl_lcm_lora.png"
output:
[INFO ] stable-diffusion.cpp:1328 - txt2img completed in 4.08s
PhotoMaker
command:
./build/bin/sd -m ../sd_models/sdxl/sdxlUnstableDiffusers_v11.safetensors --vae ../sd_models/sdxl/sdxl_vae.safetensors --stacked-id-embd-dir ../sd_models/photo_maker/photomaker-v1.safetensors --input-id-images-dir ./assets/photomaker_examples/scarletthead_woman/ -p "a girl img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" -n "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" --cfg-scale 5.0 --sampling-method euler -H 640 -W 640 --style-ratio 15 -o "output_sycl_photomaker.png"
output:
[INFO ] stable-diffusion.cpp:1328 - txt2img completed in 17.38s
Upscale
command:
./build/bin/sd -m ../sd_models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 640 -W 640 -p "a lovely cat holding a sign says \"Stable Diffusion CPP\"" --cfg-scale 4.5 --sampling-method euler --upscale-model ../sd_models/upscale/RealESRGAN_x4plus_anime_6B.pth -v -o "output_sycl_sd3_upscale.png" --seed 100
output:
TAESD
command:
./build/bin/sd -m ../sd_models/v1-5-pruned-emaonly.safetensors -p "a lovely cat" --taesd ../sd_models/taesd/diffusion_pytorch_model.safetensors -o "output_sycl_taesd.png"
output:
[INFO ] stable-diffusion.cpp:1328 - txt2img completed in 7.80s