Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan Implementation #2059

Merged
merged 156 commits into from
Jan 28, 2024
Merged
Show file tree
Hide file tree
Changes from 155 commits
Commits
Show all changes
156 commits
Select commit Hold shift + click to select a range
061246f
Vulkan loader code
0cc4m May 7, 2023
4a96d0e
Fix matmul kernel, continue implementation
0cc4m Jun 10, 2023
88d4ec0
Continue implementation
0cc4m Jun 11, 2023
a4004d4
Vulkan memory management
0cc4m Jun 11, 2023
b0e6585
Vulkan development
0cc4m Jun 12, 2023
fc4f207
Matmul call
0cc4m Jun 12, 2023
2471728
Add aligned malloc and free for VMA
0cc4m Jun 13, 2023
8ce84c2
Continue implementation
0cc4m Jun 20, 2023
a42376e
First matmul success
0cc4m Jun 22, 2023
baf9ff5
GEMM Kernel optimization
0cc4m Jun 23, 2023
1b4863c
1D Blocktiling
0cc4m Jun 24, 2023
7c6860b
2D Blocktiling
0cc4m Jun 24, 2023
0c9cca0
Write coalescing
0cc4m Jun 25, 2023
2c70df9
Continue vulkan implementation and optimization
0cc4m Jun 25, 2023
3adc7b1
First FP16 attempt, disabled for now
0cc4m Jun 28, 2023
fc5bb53
Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 k…
0cc4m Jun 28, 2023
c31e14b
Enable device extensions properly, restore fp16 matmul op
0cc4m Jun 29, 2023
40c8f84
Fix mulmat_f16
0cc4m Jun 29, 2023
df3cdbd
Output FP32 in fp16 matmul shader
0cc4m Jun 29, 2023
cb5cb4d
Fix f16_to_f32 kernel
0cc4m Jun 30, 2023
c8ff09b
dequant_q4_0 kernel
0cc4m Jun 30, 2023
4ea9b2f
Add VMA library
0cc4m Jun 30, 2023
36cd5d8
Avoid requesting dedicated memory, VMA can decide that by itself
0cc4m Jun 30, 2023
24eeb97
Add bounds checking to matmul kernels, improve implementation, fix co…
0cc4m Jul 2, 2023
3d7d8d0
add cmake commands
SlyEcho Jul 4, 2023
ade9555
Add 2d write operation, profiling code
0cc4m Jul 4, 2023
ae7325f
Fix 2d write
0cc4m Jul 4, 2023
e35d28f
Fix queue selection for AMD RADV
0cc4m Jul 4, 2023
80b17e2
Fix trailing whitespace in vk_mem_alloc.h
0cc4m Jul 4, 2023
2449390
Add WIP warp tile mat mul shaders
0cc4m Jul 5, 2023
869ae76
Disable glslc optimization
0cc4m Jul 5, 2023
ea06a2c
Disable glslc optimization for CMake
0cc4m Jul 7, 2023
6d5a0ad
Merge pull request #2 from SlyEcho/vulkan
0cc4m Jul 7, 2023
c3d9475
Optimize warptile matmul shader, replace blocktile with it
0cc4m Jul 7, 2023
c7c761a
Add split-k optimization for small matrix multiplication
0cc4m Jul 8, 2023
0ef62f5
Fix validation errors, improve compatibility with AMD GPUs
0cc4m Jul 8, 2023
3bc7a80
Rework command buffer handling
0cc4m Jul 9, 2023
8dd585e
Variable matmul kernel using specialization constants
0cc4m Jul 9, 2023
0c4d841
Fix synchronization on AMD, add barriers for buffer ownership transfe…
0cc4m Jul 15, 2023
ad3d28e
Reuse semaphores
0cc4m Jul 15, 2023
22a4cb7
Handle stage flags during command buffer submission properly
0cc4m Jul 15, 2023
f58fa51
Increase matmul test runs for consistent results
0cc4m Jul 15, 2023
931a892
Fix F32 matmul
0cc4m Jul 16, 2023
8d351b8
Merge upstream changes, fix conflict
0cc4m Jul 17, 2023
e490395
Add vectorized loading and zeropadding for matrix multiplication
0cc4m Jul 19, 2023
105fd19
Use pinned memory for f16 preprocessing
0cc4m Jul 19, 2023
9e97cb0
Don't force aligned matmul
0cc4m Jul 19, 2023
b5b1337
Don't free before queue done
0cc4m Jul 20, 2023
3432e37
Replace VMA library with native Vulkan buffer management
0cc4m Jul 20, 2023
754ea68
Basic offloading support with mul_f32 and dmmv for q4_0
0cc4m Jul 22, 2023
2859562
Run glslc commands in parallel
0cc4m Jul 22, 2023
3452095
Unroll loops in dmmv shader
0cc4m Jul 22, 2023
f2d4ca3
Reduce usage of waitIdle
0cc4m Jul 22, 2023
67843a3
Reuse pinned allocation for f16 conversion
0cc4m Jul 22, 2023
1ac8ff3
Handle devices with only a single queue
0cc4m Jul 22, 2023
53809c9
Fix trailing whitespace in CMakeLists.txt
0cc4m Jul 23, 2023
4e58028
Allow parallel execution of kernels, parallelize third and fourth dim…
0cc4m Jul 24, 2023
69554ce
Add fallback for devices only supporting one DescriptorSet per Descri…
0cc4m Jul 25, 2023
1b2ec1a
Move to graph function similar to CUDA implementation
0cc4m Jul 25, 2023
d0bd120
Use F16 kernel for most things, replace q_f32 with mul_mat_q_f16 func…
0cc4m Jul 28, 2023
44065df
Add F32 dmmv shaders
0cc4m Jul 28, 2023
f6b241e
Batch submissions
0cc4m Jul 28, 2023
6bd9bd9
Add .spv to gitignore
0cc4m Jul 28, 2023
2231618
Split off matrix vector multiplication for separate optimization
0cc4m Jul 28, 2023
582c825
Use single command buffer for matrix vector multiplication ops
0cc4m Jul 30, 2023
dc6e677
Reduce overhead of mul_f32 calls by using a single command buffer
0cc4m Jul 30, 2023
75788fe
Add submission batching to mul_f32
0cc4m Aug 1, 2023
c638955
Fix tests
0cc4m Aug 1, 2023
44bbc85
Add missing barrier
0cc4m Aug 2, 2023
ccd2592
Add further missing barrier
0cc4m Aug 6, 2023
e660943
Add further ops
0cc4m Aug 7, 2023
a07f603
Replace vk::QueueFamilyIgnored with VK_QUEUE_FAMILY_IGNORED to suppor…
0cc4m Aug 11, 2023
7ac00de
Remove unnecessary cblas link
0cc4m Aug 12, 2023
1132941
Fix descriptor set pre-allocation assert
0cc4m Aug 12, 2023
a47ca7a
Add runtime shader compilation, start transferring shaders to this ap…
0cc4m Aug 13, 2023
592ebb0
Transfer remaining shaders to header and compile on runtime
0cc4m Aug 14, 2023
01d22a4
Merge upstream changes, fix conflict
0cc4m Aug 14, 2023
e9be24f
Fix fp32 fallback if device doesn't support fp16, add force disable e…
0cc4m Aug 14, 2023
7e88677
Add support for q4_1, q5_0, q5_1 and q8_0
0cc4m Aug 15, 2023
5ae5d2b
Remove unnecessary scalar layout extension
0cc4m Aug 19, 2023
7f89e40
Parse graph early to pre-record command buffers
0cc4m Sep 29, 2023
b6591b5
Merge upstream changes, fix conflicts
0cc4m Sep 29, 2023
42bfa88
Add q6_k support
0cc4m Sep 29, 2023
da09a02
Add multi-submit for command buffers
0cc4m Sep 30, 2023
39bd512
Fix q6_k dequant shader for AMD
0cc4m Oct 3, 2023
85c1a63
Fix q6_k for GPUs without fp16 support
0cc4m Oct 8, 2023
dad1cdb
Simplify q6_k fp16 fix
0cc4m Oct 8, 2023
e2962e1
Minor fixes
0cc4m Oct 9, 2023
b447229
Fix wg_denom of m-mulmat shaders
0cc4m Oct 11, 2023
73d01d1
Add Python-based Vulkan shader generator
0cc4m Oct 14, 2023
de4b813
Replace shaderc dependency with precompiled shaders
0cc4m Oct 14, 2023
1e6e13f
Clean up code
0cc4m Oct 14, 2023
7efac61
Fix shader generator script Windows compatibility
0cc4m Oct 14, 2023
bd05447
Close file before deletion
0cc4m Oct 14, 2023
35b10d1
Merge upstream changes, fix conflict
0cc4m Oct 14, 2023
e90a651
Fix vulkan shader fp32 name
0cc4m Oct 14, 2023
a861879
Add q2_k and q3_k support
0cc4m Oct 15, 2023
4a97d2d
Add q4_k support
0cc4m Oct 20, 2023
0ec595f
Add q5_k support
0cc4m Oct 20, 2023
1b66b8b
Bake SPIR-V bytecode into the library instead of loading shaders from…
0cc4m Oct 20, 2023
a0db45f
Switch to signal semaphores for flexibility
0cc4m Oct 21, 2023
3de5ba4
Finish broadcasting mul mat support for GQA
0cc4m Oct 22, 2023
0230981
Clean up unused functions
0cc4m Oct 25, 2023
d130fe6
Merge remote-tracking branch 'origin/master' into vulkan
0cc4m Oct 25, 2023
1cb90e5
Add further ops, not yet enabled. Improve semaphore code
0cc4m Oct 31, 2023
2c7fa8d
Reduce number of used semaphores by utilizing timelines more properly
0cc4m Oct 31, 2023
80bfc59
Remove queue information
0cc4m Oct 31, 2023
2e01682
Reuse timeline semaphores, allow parallel operation with binary semap…
0cc4m Nov 1, 2023
4b7eccc
Add Vulkan to llama-bench
0cc4m Nov 1, 2023
20787d8
Merge upstream changes, fix conflicts
0cc4m Nov 1, 2023
00bea85
Remove cblas dependency
0cc4m Nov 3, 2023
bd7fa3f
Fix matmul k-split bug
0cc4m Nov 5, 2023
7f05c7f
Fix q4_k dmmv K_QUANTS_PER_ITERATION 1 shader
0cc4m Nov 5, 2023
e969445
Add RMS Norm shader, rework op_f32 shader setup, fix matmul bug
0cc4m Nov 16, 2023
39cd277
Fix issues with float16 overflows in shaders
0cc4m Nov 18, 2023
7551889
Merge upstream changes, fix conflicts
0cc4m Nov 18, 2023
471a1b0
Fix issues with older Vulkan headers on Ubuntu 22.04
0cc4m Nov 18, 2023
d9ca456
Allow multi-op partial offloading by parsing the graph to preallocate…
0cc4m Nov 19, 2023
fc63f88
Implement further ops, rework op_f32 calls, fix bugs
0cc4m Nov 26, 2023
ff93769
Finish full offloading support, add last remaining ops, fix bugs, rem…
0cc4m Dec 10, 2023
0c708c1
Upload generated file ggml-vulkan-shaders.hpp, remove redundant shaders
0cc4m Dec 10, 2023
2f5529e
Merge upstream changes, fix conflicts, adapt per-layer kv
0cc4m Dec 10, 2023
2c8a156
Merge upstream changes, fix conflicts, adapt soft_max op
0cc4m Dec 16, 2023
cd34b87
Fix Python and shader header format
0cc4m Dec 16, 2023
c05883f
Free model gpu buffers on exit
0cc4m Dec 16, 2023
5fef0d6
Merge remote-tracking branch 'origin/master' into vulkan
0cc4m Dec 17, 2023
e9e2be3
Use single queue per device to simplify code
0cc4m Dec 18, 2023
7b36cea
Add matmul shader support for running multiple calculations in parallel
0cc4m Dec 30, 2023
918c333
Merge upstream changes, fix staging buffer usage
0cc4m Jan 6, 2024
542ae3b
Merge upstream changes, fix conflicts
0cc4m Jan 8, 2024
c3290d2
Switch from semaphore-synchronized multiple command buffers per op to…
0cc4m Jan 16, 2024
02d2e38
Fix missing event cast
0cc4m Jan 17, 2024
2d14b22
Merge upstream changes, implement basic vulkan backend
0cc4m Jan 18, 2024
1811c4e
Replace uint64_t(-1) with UINT64_MAX, rename function for clarity
0cc4m Jan 18, 2024
f84c54f
Fix warning about empty C function parameters
0cc4m Jan 18, 2024
c0f3474
Fix compiler warnings
0cc4m Jan 18, 2024
6e61742
Properly implement Vulkan backend buffer handling
0cc4m Jan 21, 2024
00f214c
Fix oversized host staging buffers
0cc4m Jan 21, 2024
1f55cd2
Simplify barrier synchronization calls
0cc4m Jan 21, 2024
7fa5ca9
Fix gcc warnings
0cc4m Jan 21, 2024
f652ebf
Implement max_size for backend buffer types to limit the size of a si…
0cc4m Jan 22, 2024
bcf2a44
Use min of maxMemoryAllocationSize and maxBufferSize for device max a…
0cc4m Jan 22, 2024
6b97c71
refactor multi buf
slaren Jan 22, 2024
f2c364a
Disable unsupported ops to fix tests
0cc4m Jan 22, 2024
1c953c1
Check for maintenance4 support before using it
0cc4m Jan 23, 2024
566a178
Handle devices with only a single queue
0cc4m Jan 23, 2024
3742b6c
Fix single queue logic
0cc4m Jan 23, 2024
bc5e64b
propagate buffer usage in multi buffers
slaren Jan 23, 2024
3a15a01
Implement rope_neox op
0cc4m Jan 24, 2024
82ce1c4
Cleanup header and other files
0cc4m Jan 25, 2024
5a8a07e
Simplify gpu_extras by removing events and putting staging memcpys in…
0cc4m Jan 26, 2024
a5cca6c
Move queue into context
0cc4m Jan 26, 2024
48ad459
Simplify context use, optimize matmul shader for warp size 64 (AMD GC…
0cc4m Jan 27, 2024
9c4c15a
Merge branch 'master' into vulkan
ggerganov Jan 28, 2024
e3acca3
Add get_max_size to SYCL backend.
0cc4m Jan 28, 2024
10fbb1f
llama : fix trailing whitespace
ggerganov Jan 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ set(LLAMA_CUDA_PEER_MAX_BATCH_SIZE "128" CACHE STRING
option(LLAMA_HIPBLAS "llama: use hipBLAS" OFF)
option(LLAMA_HIP_UMA "llama: use HIP unified memory architecture" OFF)
option(LLAMA_CLBLAST "llama: use CLBlast" OFF)
option(LLAMA_VULKAN "llama: use Vulkan" OFF)
option(LLAMA_METAL "llama: use Metal" ${LLAMA_METAL_DEFAULT})
option(LLAMA_METAL_NDEBUG "llama: disable Metal debugging" OFF)
option(LLAMA_METAL_SHADER_DEBUG "llama: compile Metal with -fno-fast-math" OFF)
Expand Down Expand Up @@ -416,6 +417,22 @@ if (LLAMA_CLBLAST)
endif()
endif()

if (LLAMA_VULKAN)
find_package(Vulkan)
if (Vulkan_FOUND)
message(STATUS "Vulkan found")

add_library(ggml-vulkan STATIC ggml-vulkan.cpp ggml-vulkan.h)
target_link_libraries(ggml-vulkan PRIVATE Vulkan::Vulkan)

add_compile_definitions(GGML_USE_VULKAN)

set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} ggml-vulkan)
else()
message(WARNING "Vulkan not found")
endif()
endif()

if (LLAMA_HIPBLAS)
list(APPEND CMAKE_PREFIX_PATH /opt/rocm)

Expand Down
13 changes: 13 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,19 @@ ggml-opencl.o: ggml-opencl.cpp ggml-opencl.h
$(CXX) $(CXXFLAGS) -c $< -o $@
endif # LLAMA_CLBLAST

ifdef LLAMA_VULKAN
MK_CPPFLAGS += -DGGML_USE_VULKAN
MK_LDFLAGS += -lvulkan
OBJS += ggml-vulkan.o

ifdef LLAMA_VULKAN_CHECK_RESULTS
MK_CPPFLAGS += -DGGML_VULKAN_CHECK_RESULTS
endif

ggml-vulkan.o: ggml-vulkan.cpp ggml-vulkan.h
$(CXX) $(CXXFLAGS) -c $< -o $@
endif # LLAMA_VULKAN

ifdef LLAMA_HIPBLAS

ifeq ($(wildcard /opt/rocm),)
Expand Down
11 changes: 8 additions & 3 deletions examples/llama-bench/llama-bench.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -562,6 +562,7 @@ struct test {
static const int build_number;
static const bool cuda;
static const bool opencl;
static const bool vulkan;
static const bool metal;
static const bool gpu_blas;
static const bool blas;
Expand Down Expand Up @@ -643,6 +644,9 @@ struct test {
if (opencl) {
return "OpenCL";
}
if (vulkan) {
return "Vulkan";
}
if (metal) {
return "Metal";
}
Expand All @@ -658,7 +662,7 @@ struct test {
static const std::vector<std::string> & get_fields() {
static const std::vector<std::string> fields = {
"build_commit", "build_number",
"cuda", "opencl", "metal", "gpu_blas", "blas",
"cuda", "opencl", "vulkan", "metal", "gpu_blas", "blas",
"cpu_info", "gpu_info",
"model_filename", "model_type", "model_size", "model_n_params",
"n_batch", "n_threads", "type_k", "type_v",
Expand All @@ -682,7 +686,7 @@ struct test {
field == "avg_ns" || field == "stddev_ns") {
return INT;
}
if (field == "cuda" || field == "opencl" || field == "metal" || field == "gpu_blas" || field == "blas" ||
if (field == "cuda" || field == "opencl" || field == "vulkan"|| field == "metal" || field == "gpu_blas" || field == "blas" ||
field == "f16_kv" || field == "no_kv_offload" || field == "mul_mat_q") {
return BOOL;
}
Expand Down Expand Up @@ -710,7 +714,7 @@ struct test {
}
std::vector<std::string> values = {
build_commit, std::to_string(build_number),
std::to_string(cuda), std::to_string(opencl), std::to_string(metal), std::to_string(gpu_blas), std::to_string(blas),
std::to_string(cuda), std::to_string(opencl), std::to_string(vulkan), std::to_string(metal), std::to_string(gpu_blas), std::to_string(blas),
cpu_info, gpu_info,
model_filename, model_type, std::to_string(model_size), std::to_string(model_n_params),
std::to_string(n_batch), std::to_string(n_threads), ggml_type_name(type_k), ggml_type_name(type_v),
Expand Down Expand Up @@ -738,6 +742,7 @@ const std::string test::build_commit = LLAMA_COMMIT;
const int test::build_number = LLAMA_BUILD_NUMBER;
const bool test::cuda = !!ggml_cpu_has_cublas();
const bool test::opencl = !!ggml_cpu_has_clblast();
const bool test::vulkan = !!ggml_cpu_has_vulkan();
const bool test::metal = !!ggml_cpu_has_metal();
const bool test::gpu_blas = !!ggml_cpu_has_gpublas();
const bool test::blas = !!ggml_cpu_has_blas();
Expand Down
106 changes: 82 additions & 24 deletions ggml-alloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -778,38 +778,26 @@ size_t ggml_allocr_alloc_graph(ggml_allocr_t alloc, struct ggml_cgraph * graph)
}

// utils
ggml_backend_buffer_t ggml_backend_alloc_ctx_tensors_from_buft(struct ggml_context * ctx, ggml_backend_buffer_type_t buft) {
GGML_ASSERT(ggml_get_no_alloc(ctx) == true);

size_t alignment = ggml_backend_buft_get_alignment(buft);

size_t nbytes = 0;
for (struct ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) {
if (t->data == NULL && t->view_src == NULL) {
nbytes += GGML_PAD(ggml_backend_buft_get_alloc_size(buft, t), alignment);
}
}

if (nbytes == 0) {
// all the tensors in the context are already allocated
#ifndef NDEBUG
fprintf(stderr, "%s: all tensors in the context are already allocated\n", __func__);
#endif
return NULL;
}

ggml_backend_buffer_t buffer = ggml_backend_buft_alloc_buffer(buft, nbytes);
static bool alloc_tensor_range(struct ggml_context * ctx,
struct ggml_tensor * first, struct ggml_tensor * last,
ggml_backend_buffer_type_t buft, size_t size,
ggml_backend_buffer_t ** buffers, size_t * n_buffers) {
ggml_backend_buffer_t buffer = ggml_backend_buft_alloc_buffer(buft, size);
if (buffer == NULL) {
// failed to allocate buffer
#ifndef NDEBUG
fprintf(stderr, "%s: failed to allocate buffer\n", __func__);
fprintf(stderr, "%s: failed to allocate %s buffer of size %zu\n", __func__, ggml_backend_buft_name(buft), size);
#endif
return NULL;
for (size_t i = 0; i < *n_buffers; i++) {
ggml_backend_buffer_free(*buffers[i]);
}
free(buffers);
return false;
}

ggml_tallocr_t tallocr = ggml_tallocr_new_from_buffer(buffer);

for (struct ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) {
for (struct ggml_tensor * t = first; t != last; t = ggml_get_next_tensor(ctx, t)) {
if (t->data == NULL) {
if (t->view_src == NULL) {
ggml_tallocr_alloc(tallocr, t);
Expand All @@ -826,6 +814,76 @@ ggml_backend_buffer_t ggml_backend_alloc_ctx_tensors_from_buft(struct ggml_conte

ggml_tallocr_free(tallocr);

*buffers = realloc(*buffers, sizeof(ggml_backend_buffer_t) * (*n_buffers + 1));
(*buffers)[(*n_buffers)++] = buffer;

return true;
}

ggml_backend_buffer_t ggml_backend_alloc_ctx_tensors_from_buft(struct ggml_context * ctx, ggml_backend_buffer_type_t buft) {
GGML_ASSERT(ggml_get_no_alloc(ctx) == true);

size_t alignment = ggml_backend_buft_get_alignment(buft);
size_t max_size = ggml_backend_buft_get_max_size(buft);

ggml_backend_buffer_t * buffers = NULL;
size_t n_buffers = 0;

size_t cur_buf_size = 0;
struct ggml_tensor * first = ggml_get_first_tensor(ctx);
for (struct ggml_tensor * t = first; t != NULL; t = ggml_get_next_tensor(ctx, t)) {
size_t this_size = 0;
if (t->data == NULL && t->view_src == NULL) {
this_size = GGML_PAD(ggml_backend_buft_get_alloc_size(buft, t), alignment);
}

if (this_size > max_size) {
// tensor is too large to fit in a single buffer
fprintf(stderr, "%s: tensor %s is too large to fit in a %s buffer (tensor size: %zu, max buffer size: %zu)\n",
__func__, t->name,
ggml_backend_buft_name(buft),
this_size, max_size);
for (size_t i = 0; i < n_buffers; i++) {
ggml_backend_buffer_free(buffers[i]);
}
free(buffers);
return NULL;
}

if ((cur_buf_size + this_size) > max_size) {
// allocate tensors in the current buffer
if (!alloc_tensor_range(ctx, first, t, buft, cur_buf_size, &buffers, &n_buffers)) {
return NULL;
}
first = t;
cur_buf_size = this_size;
} else {
cur_buf_size += this_size;
}
}

// allocate remaining tensors
if (cur_buf_size > 0) {
if (!alloc_tensor_range(ctx, first, NULL, buft, cur_buf_size, &buffers, &n_buffers)) {
return NULL;
}
}

if (n_buffers == 0) {
// all the tensors in the context are already allocated
#ifndef NDEBUG
fprintf(stderr, "%s: all tensors in the context are already allocated\n", __func__);
#endif
return NULL;
}

ggml_backend_buffer_t buffer;
if (n_buffers == 1) {
buffer = buffers[0];
} else {
buffer = ggml_backend_multi_buffer_alloc_buffer(buffers, n_buffers);
}
free(buffers);
return buffer;
}

Expand Down
6 changes: 6 additions & 0 deletions ggml-backend-impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ extern "C" {
const char * (*GGML_CALL get_name) (ggml_backend_buffer_type_t buft);
ggml_backend_buffer_t (*GGML_CALL alloc_buffer) (ggml_backend_buffer_type_t buft, size_t size);
size_t (*GGML_CALL get_alignment) (ggml_backend_buffer_type_t buft); // tensor alignment
size_t (*GGML_CALL get_max_size) (ggml_backend_buffer_type_t buft); // allocation max size
size_t (*GGML_CALL get_alloc_size) (ggml_backend_buffer_type_t buft, const struct ggml_tensor * tensor); // data size needed to allocate the tensor, including padding
bool (*GGML_CALL supports_backend)(ggml_backend_buffer_type_t buft, ggml_backend_t backend); // check if the buffer type is usable by the backend
// check if tensor data is in host memory
Expand Down Expand Up @@ -63,6 +64,11 @@ extern "C" {
// do not use directly, use ggml_backend_tensor_copy instead
bool ggml_backend_buffer_copy_tensor(const struct ggml_tensor * src, struct ggml_tensor * dst);

// buffer that contains a collection of buffers
GGML_CALL ggml_backend_buffer_t ggml_backend_multi_buffer_alloc_buffer(ggml_backend_buffer_t * buffers, size_t n_buffers);
GGML_CALL bool ggml_backend_buffer_is_multi_buffer(ggml_backend_buffer_t buffer);
GGML_CALL void ggml_backend_multi_buffer_set_usage(ggml_backend_buffer_t buffer, enum ggml_backend_buffer_usage usage);

//
// Backend
//
Expand Down
Loading
Loading