Skip to content

Commit

Permalink
Vulkan Mixture of Experts (MoE) support (#7628)
Browse files Browse the repository at this point in the history
* Finish Vulkan mul_mat_id implementation

* Add Vulkan sum_rows and div ops

* Fix MUL_MAT_ID matrix matrix shader

* Fix MUL_MAT_ID matrix vector shader dispatch size

* Fix MUL_MAT_ID matrix vector shader and dispatch code

* Update Vulkan CPU offload for MUL_MAT_ID

* Fix crash when using split mode none and setting a main GPU
  • Loading branch information
0cc4m authored Jun 3, 2024
1 parent a10cda5 commit 3d7ebf6
Show file tree
Hide file tree
Showing 5 changed files with 99,976 additions and 40,426 deletions.
12 changes: 6 additions & 6 deletions common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1002,9 +1002,9 @@ bool gpt_params_find_arg(int argc, char ** argv, const std::string & arg, gpt_pa
return true;
}
params.main_gpu = std::stoi(argv[i]);
#ifndef GGML_USE_CUDA_SYCL
fprintf(stderr, "warning: llama.cpp was compiled without CUDA/SYCL. Setting the main GPU has no effect.\n");
#endif // GGML_USE_CUDA_SYCL
#ifndef GGML_USE_CUDA_SYCL_VULKAN
fprintf(stderr, "warning: llama.cpp was compiled without CUDA/SYCL/Vulkan. Setting the main GPU has no effect.\n");
#endif // GGML_USE_CUDA_SYCL_VULKAN
return true;
}
if (arg == "--split-mode" || arg == "-sm") {
Expand All @@ -1030,9 +1030,9 @@ bool gpt_params_find_arg(int argc, char ** argv, const std::string & arg, gpt_pa
invalid_param = true;
return true;
}
#ifndef GGML_USE_CUDA_SYCL
fprintf(stderr, "warning: llama.cpp was compiled without CUDA/SYCL. Setting the split mode has no effect.\n");
#endif // GGML_USE_CUDA_SYCL
#ifndef GGML_USE_CUDA_SYCL_VULKAN
fprintf(stderr, "warning: llama.cpp was compiled without CUDA/SYCL/Vulkan. Setting the split mode has no effect.\n");
#endif // GGML_USE_CUDA_SYCL_VULKAN
return true;
}
if (arg == "--tensor-split" || arg == "-ts") {
Expand Down
Loading

0 comments on commit 3d7ebf6

Please sign in to comment.