[SYCL] fix error when set main gpu to non-zero #5901

NeoZhangJianyu · 2024-03-06T09:58:30Z

fix the error when set main gpu to non-zero.
support to set to single gpu mode.

airMeng

How to ensure the main_gpu always be the most powerful one?

slaren · 2024-03-06T12:12:41Z

llama.cpp

+#ifdef GGML_USE_SYCL
+    if (split_mode == LLAMA_SPLIT_MODE_NONE) {
+        ggml_backend_sycl_set_single_device(main_gpu);
+        //SYCL use device index (0, 1, 2), instead if device id.
+        main_gpu = ggml_backend_sycl_get_device_index(main_gpu);
+    }
+#endif


Backends should not require calling backend-specific functions for normal usage. Is this ggml_backend_sycl_set_single_device function really necessary?

I guess the author want to distinguish iGPU and dGPU and offload to dGPU if no device appointed, thus Backends need to query sycl-backend for device list and return the most powerful one. Do you have any suggestion?

Could the dGPU always be mapped to the device index zero? That way, it would be used by default in single GPU mode.

Could the dGPU always be mapped to the device index zero? That way, it would be used by default in single GPU mode.

yes, this should be default. But we encountered this #5513, the user reported dGPU mapped to index 3.

Ideally, the device indices that the SYCL backend uses for its buffers and backends would be automatically ordered by the most powerful GPUs available in the system, such that the lowest indices are the most powerful GPUs. If this is not possible or desirable, it would still be ok to add a function that returns the list of available GPUs ordered by the most powerful first. Then, in llama.cpp we can use that list as a lookup table to convert the device indices used in llama.cpp to the device indices passed to the SYCL backend. This translation between llama.cpp device indices and SYCL backend device indices could be implemented in the llama_default_buffer_type_offload function for the buffers, and during the backend instance creation in llama_new_context_with_model.

SYCL backend will create GPU lists with most powerful GPUs in initial.
When llama.cpp provide the parameter: split-mode and main-gpu, SYCL backend will update the GPU list:

If split-mode is none, GPU list will be updated to new list include the main-gpu device index only.

If split-mode is layer or row, GPU list won't be changed.

Better method is to ask user provide gpu list to llama.cpp as parameter. ggml just create the GPU pool by the parameter, avoid to make mistake by detecting automatically.
It allows to support more feature, like mix GPUs, mix dGPU & iGPU.

If the gpu list include one GPU, the split mode is none in fact.

So, the parameters is changed:
from main-gpu + split-mode
to gpu-list + split-mode

I still do not understand why it is necessary to add the function ggml_backend_sycl_set_single_device. I could understand using a function such as ggml_backend_sycl_get_device_index to translate device indices from llama.cpp to the device indices used by SYCL, but that should be done always, regardless of the split mode, in llama_default_buffer_type_offload and llama_new_context_with_model as I mentioned earlier.

I am also concerned that since this function seems to change the global state of the SYCL backend, it will prevent using multiple devices simultaneously by loading a different model on each device with a different llama_model instance, and doing inference on each device in a different thread simultaneously.

In the future, we will also support using different backends simultaneously, for example so that a system with a NVIDIA device and an Intel device, we will be able to use the CUDA and SYCL backends at the same time with split mode layer. Adding these backend-specific functions will complicate the implementation of this functionality.

When using multiple devices with split mode layer or row, it is possible to exclude some devices by using the -ts parameter to set the split of a device to zero. For example, with -ts 1,0,1 only devices 0 and 2 will be used.

SYCL backend has two methods to get the GPUs info:

automatically detect the GPUs which are most powerful. It's default behavior in most case, including unit test.

according to the parameter: main-gpu comes from llama.cpp. ggml_backend_sycl_set_single_device is used in this case only.

It only impacts the GPU list in SYCL backend as global state.
This action will happen before llama_default_buffer_type_offload. so, it won't impact next model load process.

ggml-sycl.cpp

This reverts commit ceca1ae.

NeoZhangJianyu · 2024-03-07T09:16:03Z

@slaren @airMeng
I make a mistake to merge this PR before finish review.
I create revert PR: #5918.
I will update the code according to your comments for better implement.

Thank you!

This reverts commit ceca1ae.

* fix error when set main gpu to non-zero * fix delete condition

…" (ggerganov#5918) This reverts commit ceca1ae.

* fix error when set main gpu to non-zero * fix delete condition

…" (ggerganov#5918) This reverts commit ceca1ae.

* fix error when set main gpu to non-zero * fix delete condition

…" (ggerganov#5918) This reverts commit ceca1ae.

fix error when set main gpu to non-zero

b060e8a

NeoZhangJianyu requested a review from airMeng March 6, 2024 09:59

NeoZhangJianyu changed the title ~~fix error when set main gpu to non-zero~~ [SYCL] fix error when set main gpu to non-zero Mar 6, 2024

airMeng approved these changes Mar 6, 2024

View reviewed changes

slaren reviewed Mar 6, 2024

View reviewed changes

ggml-sycl.cpp Outdated Show resolved Hide resolved

fix delete condition

ad2ed8f

NeoZhangJianyu merged commit ceca1ae into ggerganov:master Mar 7, 2024
60 checks passed

NeoZhangJianyu added a commit that referenced this pull request Mar 7, 2024

Revert "[SYCL] fix error when set main gpu to non-zero (#5901)"

b5b0270

This reverts commit ceca1ae.

NeoZhangJianyu mentioned this pull request Mar 7, 2024

Revert "[SYCL] fix error when set main gpu to non-zero" #5918

Merged

slaren pushed a commit that referenced this pull request Mar 7, 2024

Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918)

89fb735

This reverts commit ceca1ae.

aahouzi mentioned this pull request Mar 8, 2024

[SYCL] GGML_ASSERT issue when running llama.cpp with SYCL on A770 #5513

Closed

hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024

[SYCL] fix error when set main gpu to non-zero (ggerganov#5901)

71917d8

* fix error when set main gpu to non-zero * fix delete condition

hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024

Revert "[SYCL] fix error when set main gpu to non-zero (ggerganov#5901)…

d4ef64e

…" (ggerganov#5918) This reverts commit ceca1ae.

NeoZhangJianyu added a commit to NeoZhangJianyu/llama.cpp that referenced this pull request Mar 12, 2024

Revert "[SYCL] fix error when set main gpu to non-zero (ggerganov#5901)…

979373c

…" (ggerganov#5918) This reverts commit ceca1ae.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

[SYCL] fix error when set main gpu to non-zero (ggerganov#5901)

437edf7

* fix error when set main gpu to non-zero * fix delete condition

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

Revert "[SYCL] fix error when set main gpu to non-zero (ggerganov#5901)…

e936b8d

…" (ggerganov#5918) This reverts commit ceca1ae.

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

[SYCL] fix error when set main gpu to non-zero (ggerganov#5901)

58ea388

* fix error when set main gpu to non-zero * fix delete condition

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

Revert "[SYCL] fix error when set main gpu to non-zero (ggerganov#5901)…

413f6da

…" (ggerganov#5918) This reverts commit ceca1ae.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] fix error when set main gpu to non-zero #5901

[SYCL] fix error when set main gpu to non-zero #5901

NeoZhangJianyu commented Mar 6, 2024

airMeng left a comment

slaren Mar 6, 2024

airMeng Mar 6, 2024

slaren Mar 6, 2024

airMeng Mar 6, 2024

slaren Mar 6, 2024

NeoZhangJianyu Mar 6, 2024

NeoZhangJianyu Mar 6, 2024

slaren Mar 6, 2024

slaren Mar 6, 2024

NeoZhangJianyu Mar 7, 2024

NeoZhangJianyu commented Mar 7, 2024

[SYCL] fix error when set main gpu to non-zero #5901

[SYCL] fix error when set main gpu to non-zero #5901

Conversation

NeoZhangJianyu commented Mar 6, 2024

airMeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NeoZhangJianyu commented Mar 7, 2024