-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple GPUs (split mode) on SYCL backend #5806
Conversation
@airMeng , @luoyu-intel @abhilash1910 could help review this PR? Thank you! |
examples/llama-bench/llama-bench.cpp
Outdated
int device_list[GGML_SYCL_MAX_DEVICES]; | ||
ggml_sycl_get_gpu_list(device_list, GGML_SYCL_MAX_DEVICES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed now, device_list
does not seem to be used anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, rm it.
examples/sycl/run-llama2.sh
Outdated
#ZES_ENABLE_SYSMAN=1, Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory. Recommended to use when --split-mode = layer. | ||
|
||
#use all GPUs with same max compute units | ||
ZES_ENABLE_SYSMAN=1 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 -s 0 -mg $GGML_SYCL_DEVICE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that -mg
is ignored with -sm layer
, which is the default, so passing it here does nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, rm -mg
common/common.cpp
Outdated
} else if (arg_next == "row") { | ||
#ifdef GGML_USE_SYCL | ||
fprintf(stderr, "warning: The split mode value:[row] is not supported by llama.cpp with SYCL. It's developing.\nExit!\n"); | ||
exit(1); | ||
#endif // GGML_USE_SYCL | ||
params.split_mode = LLAMA_SPLIT_MODE_ROW; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (arg_next == "row") { | |
#ifdef GGML_USE_SYCL | |
fprintf(stderr, "warning: The split mode value:[row] is not supported by llama.cpp with SYCL. It's developing.\nExit!\n"); | |
exit(1); | |
#endif // GGML_USE_SYCL | |
params.split_mode = LLAMA_SPLIT_MODE_ROW; | |
} else if (arg_next == "row") { | |
#ifdef GGML_USE_SYCL | |
fprintf(stderr, "warning: The split mode value:[row] is not supported by llama.cpp with SYCL. It's developing.\nExit!\n"); | |
exit(1); | |
#endif // GGML_USE_SYCL | |
params.split_mode = LLAMA_SPLIT_MODE_ROW; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, accept it.
llama.cpp
Outdated
#if (defined(GGML_USE_CUBLAS) || defined(GGML_USE_SYCL)) | ||
#define GGML_USE_CUBLAS_SYCL | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGML_USE_CUBLAS_SYCL
appears to be unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, rm it.
* suport multiple cards: split-mode - layer|row * rm warning * rebase with master, support tow new OPs, close feature for -sm=row, fix for unit test * update news * fix merge error * update according to review comments
* suport multiple cards: split-mode - layer|row * rm warning * rebase with master, support tow new OPs, close feature for -sm=row, fix for unit test * update news * fix merge error * update according to review comments
* suport multiple cards: split-mode - layer|row * rm warning * rebase with master, support tow new OPs, close feature for -sm=row, fix for unit test * update news * fix merge error * update according to review comments
Support multiple GPUs (split mode) on SYCL backend.
split mode: [none, layer] supported; [row] not supported, it's on developing.
Unify the GPU setting as Cublas backend:
same as cubals backend.