[Issue #12458] Temporarily Clamp inf Values in ggml-cpu.c to Prevent Garbled Output(or coredump) on RK3588 #12459

Corsair-cxs · 2025-03-19T02:14:06Z

Overview

This PR introduces two changes:

A function call to check_invalid_values() within ggml_graph_compute_thread() to detect inf or NaN in tensor data.
A modification in ggml_compute_forward_soft_max_f32() where, if inf is detected, it is forcibly converted to FLT_MAX.

These changes serve as a temporary workaround on the RK3588 (ARM64) platform to prevent garbled text output caused by inf values in the tensors. However, it only addresses the symptoms and may increase CPU usage.

Changes

ggml-cpu.c:
- Added calls to check_invalid_values() to detect problematic data.
- Modified ggml_compute_forward_soft_max_f32() to clamp inf to FLT_MAX.
debug_check.gdb:
- Provides a GDB script that sets breakpoints in check_invalid_values().
- Automatically prints the src0 structure and its first 128 floats whenever an inf is detected.

How to Reproduce and Debug

Compile the project in Debug mode (on RK3588 or any ARM64 environment):

cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=OFF -DGGML_OPENCL=ON -DGGML_VULKAN_CHECK_RESULTS=ON
ninja

Edit debug_check.gdb to point to your DeepSeek R1-1.5B model file path.
Run GDB:

cd llama-cpp
gdb -x debug_check.gdb ./build/bin/llama-cli

Once the breakpoint is hit at return true;, inspect the data:

p *src0
p (*src0).data
x/128f (*src0).data

You’ll see inf values in the tensor.

Notes

This patch works around the immediate issue but doesn’t tackle the underlying reason why inf values appear in the first place.
CPU overhead is increased by the additional checks/clamping.

Related Issue

Please see Issue Eval bug: RK3588 Unexpected inf values cause garbled output(or core dump) in llama-cli #12458 for more details and discussions: Eval bug: RK3588 Unexpected inf values cause garbled output(or core dump) in llama-cli #12458

check invalid values, add debug_hook function and gdb shell

3f27066

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 19, 2025

Corsair-cxs changed the title ~~fix rk3588 inf issue~~ fix rk3588 inf issue #12458 Mar 19, 2025

Corsair-cxs changed the title ~~fix rk3588 inf issue #12458~~ [Issue #12458] Temporarily Clamp inf Values in ggml-cpu.c to Prevent Garbled Output(or coredump) on RK3588 Mar 19, 2025

Djip007 mentioned this pull request Mar 28, 2025

change cpu_buft_list order: ACCEL -> GPU host -> CPU extra -> CPU #12632

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Issue #12458] Temporarily Clamp inf Values in ggml-cpu.c to Prevent Garbled Output(or coredump) on RK3588 #12459

[Issue #12458] Temporarily Clamp inf Values in ggml-cpu.c to Prevent Garbled Output(or coredump) on RK3588 #12459

Corsair-cxs commented Mar 19, 2025

Uh oh!

Uh oh!

[Issue #12458] Temporarily Clamp inf Values in ggml-cpu.c to Prevent Garbled Output(or coredump) on RK3588 #12459

Are you sure you want to change the base?

[Issue #12458] Temporarily Clamp inf Values in ggml-cpu.c to Prevent Garbled Output(or coredump) on RK3588 #12459

Conversation

Corsair-cxs commented Mar 19, 2025

Overview

Changes

How to Reproduce and Debug

Notes

Related Issue

Uh oh!

Uh oh!