Skip to content

Conversation

struct
Copy link
Contributor

@struct struct commented Jul 25, 2025

The RPC operations RPC_CMD_SET_TENSOR, RPC_CMD_SET_TENSOR_HASH, RPC_CMD_GET_TENSOR, and RPC_CMD_COPY_TENSOR all require a backend buffer was previously allocated and located during the operation otherwise the code will subsequently crash with various NULL pointer dereferences. This adds checks for NULL pointer buffers after deserialize_tensor is called for each operation.

Tested the RPC changes loading various models over the RPC interface:

$ build/bin/rpc-server -d metal
...
create_backend: using Metal backend
Starting RPC server v2.0.0
  endpoint       : 127.0.0.1:50052
  local cache    : n/a
  backend memory : 49146 MB
Accepted client connection, free_mem=51533365248, total_mem=51539607552

llama-cli on the client side:

$ build/bin/llama-cli -m ~/Downloads/Meta-Llama-3.1-8B-Instruct-abliterated-Q6_K.gguf --rpc localhost:50052 -n 64 -ngl 99
...
> This is a test!
I'm ready to help. What's the purpose of this test?

and llama-bench

$ build/bin/llama-bench -rpc localhost:50052
warning: asserts enabled, performance may be affected
warning: debug build, performance may be affected
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M4 Max)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend RPC (0 devices)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M4 Max)
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 1B Q4_0                  | 606.54 MiB |     1.10 B | Metal,BLAS,RPC |      12 |           pp512 |     3115.99 ± 130.79 |
| llama 1B Q4_0                  | 606.54 MiB |     1.10 B | Metal,BLAS,RPC |      12 |           tg128 |         86.28 ± 2.04 |

build: e40e6f7a (5986)

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 25, 2025
@CISC CISC merged commit 64bf1c3 into ggml-org:master Jul 25, 2025
47 checks passed
taronaeo pushed a commit to taronaeo/llama.cpp-s390x that referenced this pull request Jul 25, 2025
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jul 25, 2025
* origin/master:
docs : update HOWTO‑add‑model.md for ModelBase and new model classes (ggml-org#14874)
ggml : remove invalid portPos specifiers from dot files (ggml-org#14838)
context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-org#14870)
mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-org#14503)
rpc : check for null buffers in get/set/copy tensor endpoints (ggml-org#14868)
sched : fix multiple evaluations of the same graph with pipeline parallelism (ggml-org#14855)
musa: upgrade musa sdk to rc4.2.0 (ggml-org#14498)
sync : ggml
cmake : fix usage issues (ggml/1257)
ggml-cpu : remove stdlib include from repack.cpp (ggml/1276)
context : perform output reorder lazily upon access after sync (ggml-org#14853)
chat : fix kimi-k2 chat template (ggml-org#14852)
sycl: fixed semantics of block offset calculation (ggml-org#14814)
llama : fix MiniCPM inference after Granite Four changes (ggml-org#14850)
docs: add libcurl-dev install hint for Linux distros (ggml-org#14801)
metal : fix fusion across different encoders (ggml-org#14849)
sycl: fix undefined variable in work group size check (ggml-org#14843)
convert : text-only support for GLM-4.1V-9B-Thinking (ggml-org#14823)
CUDA: fix overflow in FA, tune performance (ggml-org#14840)
CUDA: fix compilation with GGML_CUDA_F16 (ggml-org#14837)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants