Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON: [key] -> .at(key), assert() -> GGML_ASSERT #7143

Merged
merged 1 commit into from
May 8, 2024

Conversation

JohannesGaessler
Copy link
Collaborator

Fixes #7133 .

The issue on master is that when using the subscript operator on JSON this can cause a segfault. The JSON library has built-in checks that avoid segfaults and instead cause failed assertions but those use assert() and are optimized out by default. This PR configures the JSON library to instead use GGML_ASSERT which is not optimized out in order to at least avoid segfaults. It also replaces all instances of the subscript operator that I could find with .at(key) in order to enable error handling. The error returned by the server upon an empty JSON is kind of cryptic but it is much better than a segfault:

{"error":{"code":500,"message":"[json.exception.type_error.304] cannot use at() with null","type":"server_error"}}

I consider writing better error handling to be out-of-scope for this PR since the immediate priority is to avoid denial-of-service attacks.

examples/server/server.cpp Outdated Show resolved Hide resolved
common/common.cpp Show resolved Hide resolved
@ngxson
Copy link
Collaborator

ngxson commented May 8, 2024

FYI, CI failed because test-json-schema-to-grammar.cpp does not use assert() function from cassert. Simple fix would be to add #include <cassert>, because all other test files use cassert

@ngxson
Copy link
Collaborator

ngxson commented May 8, 2024

Also regarding using GGML_ASSERT in this situation, I think it's a bit overkill since it calls ggml_print_backtrace() internally. I'm wondering if we can simply do:

#define JSON_ASSERT(x) if(!(x)){std::abort();}

I'd like to also ask @slaren for advice. Ref: https://json.nlohmann.me/api/macros/json_assert

@slaren
Copy link
Collaborator

slaren commented May 8, 2024

GGML_ASSERT may make issues easier to debug than simply calling abort, which will just end the process silently. The overhead of parsing the json is probably low enough that it doesn't really matter for the overall performance of the server either way.

Copy link
Contributor

github-actions bot commented May 8, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 545 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8649.16ms p(95)=20685.91ms fails=, finish reason: stop=474 truncated=71
  • Prompt processing (pp): avg=98.85tk/s p(95)=423.91tk/s
  • Token generation (tg): avg=32.69tk/s p(95)=47.19tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=json-fix-segfault commit=1aa591a96f71afb31c7293a9d2e60b458d91cf65

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715194745 --> 1715195381
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 800.8, 800.8, 800.8, 800.8, 800.8, 558.77, 558.77, 558.77, 558.77, 558.77, 576.0, 576.0, 576.0, 576.0, 576.0, 644.36, 644.36, 644.36, 644.36, 644.36, 679.65, 679.65, 679.65, 679.65, 679.65, 679.57, 679.57, 679.57, 679.57, 679.57, 682.67, 682.67, 682.67, 682.67, 682.67, 720.34, 720.34, 720.34, 720.34, 720.34, 722.4, 722.4, 722.4, 722.4, 722.4, 739.55, 739.55, 739.55, 739.55, 739.55, 768.84, 768.84, 768.84, 768.84, 768.84, 804.4, 804.4, 804.4, 804.4, 804.4, 833.04, 833.04, 833.04, 833.04, 833.04, 822.69, 822.69, 822.69, 822.69, 822.69, 817.04, 817.04, 817.04, 817.04, 817.04, 817.23, 817.23, 817.23, 817.23, 817.23, 815.53, 815.53, 815.53, 815.53, 815.53, 777.85, 777.85, 777.85, 777.85, 777.85, 779.25, 779.25, 779.25, 779.25, 779.25, 780.18, 780.18, 780.18, 780.18, 780.18, 786.67, 786.67, 786.67, 786.67, 786.67, 789.67, 789.67, 789.67, 789.67, 789.67, 813.15, 813.15, 813.15, 813.15, 813.15, 809.64, 809.64, 809.64, 809.64, 809.64, 808.0, 808.0, 808.0, 808.0, 808.0, 809.23, 809.23, 809.23, 809.23, 809.23, 821.74, 821.74, 821.74, 821.74, 821.74, 820.47, 820.47, 820.47, 820.47, 820.47, 819.14, 819.14, 819.14, 819.14, 819.14, 821.78, 821.78, 821.78, 821.78, 821.78, 823.51, 823.51, 823.51, 823.51, 823.51, 822.14, 822.14, 822.14, 822.14, 822.14, 826.75, 826.75, 826.75, 826.75, 826.75, 831.78, 831.78, 831.78, 831.78, 831.78, 823.57, 823.57, 823.57, 823.57, 823.57, 828.72, 828.72, 828.72, 828.72, 828.72, 827.11, 827.11, 827.11, 827.11, 827.11, 825.66, 825.66, 825.66, 825.66, 825.66, 825.78, 825.78, 825.78, 825.78, 825.78, 830.28, 830.28, 830.28, 830.28, 830.28, 829.92, 829.92, 829.92, 829.92, 829.92, 825.46, 825.46, 825.46, 825.46, 825.46, 818.27, 818.27, 818.27, 818.27, 818.27, 817.33, 817.33, 817.33, 817.33, 817.33, 815.86, 815.86, 815.86, 815.86, 815.86, 814.48, 814.48, 814.48, 814.48, 814.48, 818.27, 818.27, 818.27, 818.27, 818.27, 821.06, 821.06, 821.06, 821.06, 821.06, 821.44, 821.44, 821.44, 821.44, 821.44, 826.4, 826.4, 826.4, 826.4, 826.4, 827.24, 827.24, 827.24, 827.24, 827.24, 831.79, 831.79, 831.79, 831.79, 831.79, 832.41, 832.41, 832.41, 832.41, 832.41, 834.07, 834.07, 834.07, 834.07, 834.07, 838.6, 838.6, 838.6, 838.6, 838.6, 840.02, 840.02, 840.02, 840.02, 840.02, 839.87, 839.87, 839.87, 839.87, 839.87, 841.26, 841.26, 841.26, 841.26, 841.26, 842.47, 842.47, 842.47, 842.47, 842.47, 842.41, 842.41, 842.41, 842.41, 842.41, 845.16, 845.16, 845.16, 845.16]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715194745 --> 1715195381
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 32.94, 32.94, 32.94, 32.94, 32.94, 33.01, 33.01, 33.01, 33.01, 33.01, 31.6, 31.6, 31.6, 31.6, 31.6, 32.85, 32.85, 32.85, 32.85, 32.85, 32.95, 32.95, 32.95, 32.95, 32.95, 33.02, 33.02, 33.02, 33.02, 33.02, 33.78, 33.78, 33.78, 33.78, 33.78, 34.48, 34.48, 34.48, 34.48, 34.48, 34.57, 34.57, 34.57, 34.57, 34.57, 34.43, 34.43, 34.43, 34.43, 34.43, 34.17, 34.17, 34.17, 34.17, 34.17, 34.11, 34.11, 34.11, 34.11, 34.11, 33.4, 33.4, 33.4, 33.4, 33.4, 32.96, 32.96, 32.96, 32.96, 32.96, 32.16, 32.16, 32.16, 32.16, 32.16, 32.38, 32.38, 32.38, 32.38, 32.38, 32.55, 32.55, 32.55, 32.55, 32.55, 32.63, 32.63, 32.63, 32.63, 32.63, 32.26, 32.26, 32.26, 32.26, 32.26, 32.1, 32.1, 32.1, 32.1, 32.1, 31.83, 31.83, 31.83, 31.83, 31.83, 31.78, 31.78, 31.78, 31.78, 31.78, 31.85, 31.85, 31.85, 31.85, 31.85, 31.76, 31.76, 31.76, 31.76, 31.76, 31.82, 31.82, 31.82, 31.82, 31.82, 32.04, 32.04, 32.04, 32.04, 32.04, 31.9, 31.9, 31.9, 31.9, 31.9, 31.44, 31.44, 31.44, 31.44, 31.44, 31.23, 31.23, 31.23, 31.23, 31.23, 31.51, 31.51, 31.51, 31.51, 31.51, 31.71, 31.71, 31.71, 31.71, 31.71, 31.79, 31.79, 31.79, 31.79, 31.79, 31.93, 31.93, 31.93, 31.93, 31.93, 31.94, 31.94, 31.94, 31.94, 31.94, 31.88, 31.88, 31.88, 31.88, 31.88, 31.82, 31.82, 31.82, 31.82, 31.82, 31.59, 31.59, 31.59, 31.59, 31.59, 31.44, 31.44, 31.44, 31.44, 31.44, 31.55, 31.55, 31.55, 31.55, 31.55, 31.71, 31.71, 31.71, 31.71, 31.71, 31.79, 31.79, 31.79, 31.79, 31.79, 31.87, 31.87, 31.87, 31.87, 31.87, 31.6, 31.6, 31.6, 31.6, 31.6, 31.28, 31.28, 31.28, 31.28, 31.28, 31.25, 31.25, 31.25, 31.25, 31.25, 29.77, 29.77, 29.77, 29.77, 29.77, 29.72, 29.72, 29.72, 29.72, 29.72, 29.73, 29.73, 29.73, 29.73, 29.73, 29.8, 29.8, 29.8, 29.8, 29.8, 29.91, 29.91, 29.91, 29.91, 29.91, 29.96, 29.96, 29.96, 29.96, 29.96, 30.06, 30.06, 30.06, 30.06, 30.06, 30.04, 30.04, 30.04, 30.04, 30.04, 29.96, 29.96, 29.96, 29.96, 29.96, 29.87, 29.87, 29.87, 29.87, 29.87, 29.92, 29.92, 29.92, 29.92, 29.92, 30.07, 30.07, 30.07, 30.07, 30.07, 30.21, 30.21, 30.21, 30.21, 30.21, 30.27, 30.27, 30.27, 30.27, 30.27, 30.34, 30.34, 30.34, 30.34, 30.34, 30.4, 30.4, 30.4, 30.4]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715194745 --> 1715195381
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.26, 0.26, 0.26, 0.26, 0.26, 0.34, 0.34, 0.34, 0.34, 0.34, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.25, 0.25, 0.25, 0.25, 0.25, 0.19, 0.19, 0.19, 0.19, 0.19, 0.38, 0.38, 0.38, 0.38, 0.38, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.28, 0.28, 0.28, 0.28, 0.28, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.32, 0.32, 0.32, 0.32, 0.32, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.36, 0.36, 0.36, 0.36, 0.36, 0.27, 0.27, 0.27, 0.27, 0.27, 0.1, 0.1, 0.1, 0.1, 0.1, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.32, 0.32, 0.32, 0.32, 0.32, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.29, 0.29, 0.29, 0.29, 0.29, 0.63, 0.63, 0.63, 0.63, 0.63, 0.6, 0.6, 0.6, 0.6, 0.6, 0.59, 0.59, 0.59, 0.59, 0.59, 0.1, 0.1, 0.1, 0.1, 0.1, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.07, 0.07, 0.07, 0.07, 0.07, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715194745 --> 1715195381
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0]
                    
Loading

@JohannesGaessler
Copy link
Collaborator Author

Does someone know why the CI builds are failing? I don't understand why adding #include ggml.h causes assert() to be undefined.

@slaren
Copy link
Collaborator

slaren commented May 8, 2024

It's not ggml.h, when JSON_ASSERT is not defined it includes <cassert>. The correct fix is to include <cassert> in the sources that require it.

// allow overriding assert
#if !defined(JSON_ASSERT)
    #include <cassert> // assert
    #define JSON_ASSERT(x) assert(x)
#endif

@JohannesGaessler
Copy link
Collaborator Author

Thank you.

@ggerganov ggerganov requested a review from ngxson May 8, 2024 18:48
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

@ngxson ngxson merged commit c12452c into ggerganov:master May 8, 2024
64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault in example server (/v1/chat/completions route) given incorrect JSON payload
3 participants