gguf : enforce that tensor names are unique #6905

ngxson · 2024-04-25T11:43:44Z

Resolve #6836

For now, only adding duplicated tensor is prevented:

ggml.c uses gguf_find_tensor to check for duplicated tensor. This is a linear search so may not be a suitable implementation when loading the gguf (but acceptable when writing gguf, since writing is not happen very often). What do you think about this point @slaren ?
GGUFWriter from gguf-py uses a set to keep track of tensor names when writing a gguf. We can use the same logic for GGUFReader

github-actions · 2024-04-25T12:12:34Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 211 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=22836.94ms p(95)=42547.38ms fails=, finish reason: stop=99 truncated=112
Prompt processing (pp): avg=261.91tk/s p(95)=737.96tk/s
Token generation (tg): avg=19.17tk/s p(95)=27.0tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=xsn/duplicated_tensor_name commit=b1df2f212e4ab03b6cffbf66a1c26a718bc6677c

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 211 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1714296843 --> 1714297479
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 364.4, 364.4, 364.4, 364.4, 364.4, 424.47, 424.47, 424.47, 424.47, 424.47, 667.24, 667.24, 667.24, 667.24, 667.24, 557.81, 557.81, 557.81, 557.81, 557.81, 556.36, 556.36, 556.36, 556.36, 556.36, 553.92, 553.92, 553.92, 553.92, 553.92, 549.94, 549.94, 549.94, 549.94, 549.94, 546.06, 546.06, 546.06, 546.06, 546.06, 567.79, 567.79, 567.79, 567.79, 567.79, 566.64, 566.64, 566.64, 566.64, 566.64, 575.76, 575.76, 575.76, 575.76, 575.76, 588.64, 588.64, 588.64, 588.64, 588.64, 612.48, 612.48, 612.48, 612.48, 612.48, 616.62, 616.62, 616.62, 616.62, 616.62, 615.96, 615.96, 615.96, 615.96, 615.96, 614.95, 614.95, 614.95, 614.95, 614.95, 616.78, 616.78, 616.78, 616.78, 616.78, 615.96, 615.96, 615.96, 615.96, 615.96, 624.39, 624.39, 624.39, 624.39, 624.39, 627.91, 627.91, 627.91, 627.91, 627.91, 627.02, 627.02, 627.02, 627.02, 627.02, 634.76, 634.76, 634.76, 634.76, 634.76, 633.43, 633.43, 633.43, 633.43, 633.43, 634.43, 634.43, 634.43, 634.43, 634.43, 642.42, 642.42, 642.42, 642.42, 642.42, 643.49, 643.49, 643.49, 643.49, 643.49, 642.56, 642.56, 642.56, 642.56, 642.56, 641.56, 641.56, 641.56, 641.56, 641.56, 651.86, 651.86, 651.86, 651.86, 651.86, 650.35, 650.35, 650.35, 650.35, 650.35, 648.97, 648.97, 648.97, 648.97, 648.97, 660.84, 660.84, 660.84, 660.84, 660.84, 656.79, 656.79, 656.79, 656.79, 656.79, 655.96, 655.96, 655.96, 655.96, 655.96, 655.17, 655.17, 655.17, 655.17, 655.17, 639.67, 639.67, 639.67, 639.67, 639.67, 646.12, 646.12, 646.12, 646.12, 646.12, 647.77, 647.77, 647.77, 647.77, 647.77, 645.59, 645.59, 645.59, 645.59, 645.59, 644.81, 644.81, 644.81, 644.81, 644.81, 639.09, 639.09, 639.09, 639.09, 639.09, 638.39, 638.39, 638.39, 638.39, 638.39, 638.7, 638.7, 638.7, 638.7, 638.7, 642.12, 642.12, 642.12, 642.12, 642.12, 642.67, 642.67, 642.67, 642.67, 642.67, 643.88, 643.88, 643.88, 643.88, 643.88, 642.99, 642.99, 642.99, 642.99, 642.99, 641.2, 641.2, 641.2, 641.2, 641.2, 653.68, 653.68, 653.68, 653.68, 653.68, 655.38, 655.38, 655.38, 655.38, 655.38, 654.79, 654.79, 654.79, 654.79, 654.79, 652.86, 652.86, 652.86, 652.86, 652.86, 654.55, 654.55, 654.55, 654.55, 654.55, 653.82, 653.82, 653.82, 653.82, 653.82, 654.61, 654.61, 654.61, 654.61, 654.61, 654.16, 654.16, 654.16, 654.16, 654.16, 658.4, 658.4, 658.4, 658.4, 658.4, 658.6, 658.6, 658.6, 658.6, 658.6, 661.45, 661.45, 661.45, 661.45, 661.45, 660.71, 660.71, 660.71, 660.71, 660.71, 661.76, 661.76, 661.76, 661.76, 661.76, 661.76, 661.76, 661.76, 661.76]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 211 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1714296843 --> 1714297479
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 30.94, 30.94, 30.94, 30.94, 30.94, 28.33, 28.33, 28.33, 28.33, 28.33, 28.33, 28.33, 28.33, 28.33, 28.33, 25.56, 25.56, 25.56, 25.56, 25.56, 21.68, 21.68, 21.68, 21.68, 21.68, 19.48, 19.48, 19.48, 19.48, 19.48, 16.98, 16.98, 16.98, 16.98, 16.98, 17.14, 17.14, 17.14, 17.14, 17.14, 17.19, 17.19, 17.19, 17.19, 17.19, 17.93, 17.93, 17.93, 17.93, 17.93, 18.42, 18.42, 18.42, 18.42, 18.42, 18.44, 18.44, 18.44, 18.44, 18.44, 18.58, 18.58, 18.58, 18.58, 18.58, 18.58, 18.58, 18.58, 18.58, 18.58, 18.57, 18.57, 18.57, 18.57, 18.57, 18.7, 18.7, 18.7, 18.7, 18.7, 18.8, 18.8, 18.8, 18.8, 18.8, 19.14, 19.14, 19.14, 19.14, 19.14, 19.28, 19.28, 19.28, 19.28, 19.28, 19.4, 19.4, 19.4, 19.4, 19.4, 19.45, 19.45, 19.45, 19.45, 19.45, 19.46, 19.46, 19.46, 19.46, 19.46, 19.48, 19.48, 19.48, 19.48, 19.48, 19.53, 19.53, 19.53, 19.53, 19.53, 19.54, 19.54, 19.54, 19.54, 19.54, 19.54, 19.54, 19.54, 19.54, 19.54, 19.51, 19.51, 19.51, 19.51, 19.51, 19.54, 19.54, 19.54, 19.54, 19.54, 19.52, 19.52, 19.52, 19.52, 19.52, 19.51, 19.51, 19.51, 19.51, 19.51, 19.43, 19.43, 19.43, 19.43, 19.43, 19.3, 19.3, 19.3, 19.3, 19.3, 19.15, 19.15, 19.15, 19.15, 19.15, 19.05, 19.05, 19.05, 19.05, 19.05, 18.89, 18.89, 18.89, 18.89, 18.89, 18.81, 18.81, 18.81, 18.81, 18.81, 18.69, 18.69, 18.69, 18.69, 18.69, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.26, 18.26, 18.26, 18.26, 18.26, 17.98, 17.98, 17.98, 17.98, 17.98, 17.66, 17.66, 17.66, 17.66, 17.66, 17.55, 17.55, 17.55, 17.55, 17.55, 17.53, 17.53, 17.53, 17.53, 17.53, 17.54, 17.54, 17.54, 17.54, 17.54, 17.58, 17.58, 17.58, 17.58, 17.58, 17.65, 17.65, 17.65, 17.65, 17.65, 17.69, 17.69, 17.69, 17.69, 17.69, 17.68, 17.68, 17.68, 17.68, 17.68, 17.67, 17.67, 17.67, 17.67, 17.67, 17.6, 17.6, 17.6, 17.6, 17.6, 17.49, 17.49, 17.49, 17.49, 17.49, 17.45, 17.45, 17.45, 17.45, 17.45, 17.46, 17.46, 17.46, 17.46, 17.46, 17.51, 17.51, 17.51, 17.51, 17.51, 17.53, 17.53, 17.53, 17.53, 17.53, 17.57, 17.57, 17.57, 17.57, 17.57, 17.62, 17.62, 17.62, 17.62, 17.62, 17.68, 17.68, 17.68, 17.68, 17.68, 17.74, 17.74, 17.74, 17.74, 17.74, 17.79, 17.79, 17.79, 17.79]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 211 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1714296843 --> 1714297479
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.08, 0.08, 0.08, 0.08, 0.08, 0.17, 0.17, 0.17, 0.17, 0.17, 0.34, 0.34, 0.34, 0.34, 0.34, 0.39, 0.39, 0.39, 0.39, 0.39, 0.45, 0.45, 0.45, 0.45, 0.45, 0.47, 0.47, 0.47, 0.47, 0.47, 0.46, 0.46, 0.46, 0.46, 0.46, 0.12, 0.12, 0.12, 0.12, 0.12, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.29, 0.29, 0.29, 0.29, 0.29, 0.21, 0.21, 0.21, 0.21, 0.21, 0.25, 0.25, 0.25, 0.25, 0.25, 0.21, 0.21, 0.21, 0.21, 0.21, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.25, 0.25, 0.25, 0.25, 0.25, 0.2, 0.2, 0.2, 0.2, 0.2, 0.19, 0.19, 0.19, 0.19, 0.19, 0.25, 0.25, 0.25, 0.25, 0.25, 0.22, 0.22, 0.22, 0.22, 0.22, 0.34, 0.34, 0.34, 0.34, 0.34, 0.3, 0.3, 0.3, 0.3, 0.3, 0.33, 0.33, 0.33, 0.33, 0.33, 0.35, 0.35, 0.35, 0.35, 0.35, 0.24, 0.24, 0.24, 0.24, 0.24, 0.35, 0.35, 0.35, 0.35, 0.35, 0.43, 0.43, 0.43, 0.43, 0.43, 0.51, 0.51, 0.51, 0.51, 0.51, 0.52, 0.52, 0.52, 0.52, 0.52, 0.44, 0.44, 0.44, 0.44, 0.44, 0.35, 0.35, 0.35, 0.35, 0.35, 0.2, 0.2, 0.2, 0.2, 0.2, 0.23, 0.23, 0.23, 0.23, 0.23, 0.2, 0.2, 0.2, 0.2, 0.2, 0.27, 0.27, 0.27, 0.27, 0.27, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.26, 0.26, 0.26, 0.26, 0.26, 0.36, 0.36, 0.36, 0.36, 0.36, 0.4, 0.4, 0.4, 0.4, 0.4, 0.37, 0.37, 0.37, 0.37, 0.37, 0.18, 0.18, 0.18, 0.18, 0.18, 0.2, 0.2, 0.2, 0.2, 0.2, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.24, 0.24, 0.24, 0.24, 0.24, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.18, 0.18, 0.18, 0.18, 0.18, 0.26, 0.26, 0.26, 0.26]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 211 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1714296843 --> 1714297479
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0]

slaren · 2024-04-25T12:15:00Z

A linear search is definitely not optimal, but we don't really have a good alternative at the moment, since we don't have implementations of the data structures that would be necessary to prevent this in ggml. Currently the models are small enough that it is not likely to cause a significant performance issues, and fixing the immediate issue is more important. So I think it is ok to use gguf_find_tensor.

gguf-py/gguf/gguf_reader.py

ngxson · 2024-04-27T19:27:21Z

@slaren FYI, here is the model that I used for testing: https://huggingface.co/ngxson/test_gguf_models/blob/main/stories260K_duplicated_tensors.gguf

I edited it with hex editor to have 2 times blk.0.attn_norm.weight

llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

slaren

Looks good, I verified that this prevents both loading existing models with duplicated tensors, and creating models with duplicated tensors. The GPT-2 arch conversion in convert-hf-to-gguf.py will need to be updated to avoid duplicating output.weight.

ngxson · 2024-04-28T15:31:22Z

@slaren Thanks for the confirmation. I'm not very familiar with convert-hf-to-gguf.py, so unfortunately I have no clue how to do that. Without that fix, do you think this PR is still ok to be merged?

slaren · 2024-04-28T15:34:26Z

Yes, that's a different issue, I was just pointing that's the reason why some models have duplicated tensors. Merge at will.

* not allow adding duplicated tensor name * no duplicated tensor while reading gguf * typo * throw exception inside llama_model_loader Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>

not allow adding duplicated tensor name

9cd09aa

no duplicated tensor while reading gguf

509c28f

ngxson marked this pull request as ready for review April 27, 2024 13:49

ngxson requested a review from slaren April 27, 2024 13:49

slaren reviewed Apr 27, 2024

View reviewed changes

gguf-py/gguf/gguf_reader.py Outdated Show resolved Hide resolved

typo

2df79ff

slaren reviewed Apr 27, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

throw exception inside llama_model_loader

b1df2f2

Co-authored-by: slaren <slarengh@gmail.com>

slaren approved these changes Apr 28, 2024

View reviewed changes

ngxson merged commit 7bb36cc into ggml-org:master Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gguf : enforce that tensor names are unique #6905

gguf : enforce that tensor names are unique #6905

Uh oh!

ngxson commented Apr 25, 2024

Uh oh!

github-actions bot commented Apr 25, 2024 •

edited

Loading

Uh oh!

slaren commented Apr 25, 2024 •

edited

Loading

Uh oh!

Uh oh!

ngxson commented Apr 27, 2024

Uh oh!

Uh oh!

slaren left a comment

Uh oh!

ngxson commented Apr 28, 2024 •

edited

Loading

Uh oh!

slaren commented Apr 28, 2024

Uh oh!

Uh oh!

gguf : enforce that tensor names are unique #6905

gguf : enforce that tensor names are unique #6905

Uh oh!

Conversation

ngxson commented Apr 25, 2024

Uh oh!

github-actions bot commented Apr 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Apr 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ngxson commented Apr 27, 2024

Uh oh!

Uh oh!

slaren left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson commented Apr 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Apr 28, 2024

Uh oh!

Uh oh!

github-actions bot commented Apr 25, 2024 •

edited

Loading

slaren commented Apr 25, 2024 •

edited

Loading

ngxson commented Apr 28, 2024 •

edited

Loading