Skip to content

Commit

Permalink
ggml : add ggml_soft_max_ext (ggerganov#4256)
Browse files Browse the repository at this point in the history
* metal : implement soft_max_ext

* cuda : implement soft_max_ext

* ggml : implement soft_max_ext (CPU)

* batched-bench : print threads

ggml-ci

* metal : simplify soft_max encoding

ggml-ci

* cuda : use 512 threads for soft_max instead of 32

* ggml : update soft max cpu

* cuda : do warp-based block reduce

* cuda : increase max block size to 1024

* cuda : fix warp reduction initialization of shared mem

* metal : warp-based reduction for soft max kernel

* metal : warp-based reduce for rms_norm

* metal : simplify soft max kernel

ggml-ci

* alloc : fix build with debug
  • Loading branch information
ggerganov authored and mounta11n committed Dec 19, 2023
1 parent ff6293c commit 3e951fc
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion ggml-alloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ void ggml_tallocr_alloc(ggml_tallocr_t alloc, struct ggml_tensor * tensor) {

#ifdef GGML_ALLOCATOR_DEBUG
add_allocated_tensor(alloc, tensor);
size_t cur_max = (char*)addr - (char*)alloc->data + size;
size_t cur_max = (char*)addr - (char*)alloc->base + size;
if (cur_max > alloc->max_size) {
printf("max_size = %.2f MB: tensors: ", cur_max / 1024.0 / 1024.0);
for (int i = 0; i < 1024; i++) {
Expand Down

0 comments on commit 3e951fc

Please sign in to comment.