Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : fix llamafile sgemm wdata offsets #6710

Merged
merged 1 commit into from
Apr 16, 2024
Merged

ggml : fix llamafile sgemm wdata offsets #6710

merged 1 commit into from
Apr 16, 2024

Conversation

ggerganov
Copy link
Owner

ref #6414

test-backend-ops was failing MUL_MAT tests with Q4_0 and Q8_0 due to incorrect wdata reads:

make tests && ./tests/test-backend-ops -o MUL_MAT -b CPU
  • Fix wdata offset when ggml_blck_size(vec_dot_type) > 1
  • GGML_USE_LLAMAFILE is defined by the build system (Make + CMake)

@ggerganov ggerganov merged commit 666867b into master Apr 16, 2024
63 of 65 checks passed
@ggerganov ggerganov deleted the gg/fix-sgemm branch April 16, 2024 20:50
Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 447 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=10600.89ms p(95)=27375.91ms fails=, finish reason: stop=389 truncated=58
  • Prompt processing (pp): avg=118.7tk/s p(95)=553.68tk/s
  • Token generation (tg): avg=23.72tk/s p(95)=36.95tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=gg/fix-sgemm commit=42b5d17c32a49bdeae0b97608b67884c85019c36

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 447 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1713301053 --> 1713301683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 493.75, 493.75, 493.75, 493.75, 493.75, 474.61, 474.61, 474.61, 474.61, 474.61, 490.15, 490.15, 490.15, 490.15, 490.15, 502.32, 502.32, 502.32, 502.32, 502.32, 522.17, 522.17, 522.17, 522.17, 522.17, 537.87, 537.87, 537.87, 537.87, 537.87, 540.68, 540.68, 540.68, 540.68, 540.68, 546.19, 546.19, 546.19, 546.19, 546.19, 563.07, 563.07, 563.07, 563.07, 563.07, 566.96, 566.96, 566.96, 566.96, 566.96, 569.34, 569.34, 569.34, 569.34, 569.34, 579.63, 579.63, 579.63, 579.63, 579.63, 576.82, 576.82, 576.82, 576.82, 576.82, 591.98, 591.98, 591.98, 591.98, 591.98, 616.72, 616.72, 616.72, 616.72, 616.72, 625.14, 625.14, 625.14, 625.14, 625.14, 634.62, 634.62, 634.62, 634.62, 634.62, 563.76, 563.76, 563.76, 563.76, 563.76, 549.36, 549.36, 549.36, 549.36, 549.36, 554.07, 554.07, 554.07, 554.07, 554.07, 554.6, 554.6, 554.6, 554.6, 554.6, 569.38, 569.38, 569.38, 569.38, 569.38, 573.56, 573.56, 573.56, 573.56, 573.56, 576.39, 576.39, 576.39, 576.39, 576.39, 576.49, 576.49, 576.49, 576.49, 576.49, 582.73, 582.73, 582.73, 582.73, 582.73, 583.88, 583.88, 583.88, 583.88, 583.88, 588.59, 588.59, 588.59, 588.59, 588.59, 604.78, 604.78, 604.78, 604.78, 604.78, 604.1, 604.1, 604.1, 604.1, 604.1, 608.05, 608.05, 608.05, 608.05, 608.05, 610.86, 610.86, 610.86, 610.86, 610.86, 620.78, 620.78, 620.78, 620.78, 620.78, 619.25, 619.25, 619.25, 619.25, 619.25, 620.57, 620.57, 620.57, 620.57, 620.57, 621.99, 621.99, 621.99, 621.99, 621.99, 626.68, 626.68, 626.68, 626.68, 626.68, 628.87, 628.87, 628.87, 628.87, 628.87, 628.56, 628.56, 628.56, 628.56, 628.56, 629.76, 629.76, 629.76, 629.76, 629.76, 632.99, 632.99, 632.99, 632.99, 632.99, 641.4, 641.4, 641.4, 641.4, 641.4, 647.46, 647.46, 647.46, 647.46, 647.46, 644.8, 644.8, 644.8, 644.8, 644.8, 644.83, 644.83, 644.83, 644.83, 644.83, 644.66, 644.66, 644.66, 644.66, 644.66, 645.34, 645.34, 645.34, 645.34, 645.34, 648.51, 648.51, 648.51, 648.51, 648.51, 652.26, 652.26, 652.26, 652.26, 652.26, 650.22, 650.22, 650.22, 650.22, 650.22, 643.31, 643.31, 643.31, 643.31, 643.31, 642.98, 642.98, 642.98, 642.98, 642.98, 642.67, 642.67, 642.67, 642.67, 642.67, 640.86, 640.86, 640.86, 640.86, 640.86, 640.16, 640.16, 640.16, 640.16, 640.16, 639.47, 639.47, 639.47, 639.47, 639.47, 638.95, 638.95, 638.95, 638.95, 638.95, 645.02, 645.02, 645.02, 645.02, 645.02, 645.48, 645.48, 645.48, 645.48, 645.48, 648.38, 648.38, 648.38, 648.38, 648.38, 645.04, 645.04, 645.04, 645.04, 645.04, 645.04]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 447 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1713301053 --> 1713301683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 38.91, 38.91, 38.91, 38.91, 38.91, 35.21, 35.21, 35.21, 35.21, 35.21, 24.52, 24.52, 24.52, 24.52, 24.52, 24.16, 24.16, 24.16, 24.16, 24.16, 24.13, 24.13, 24.13, 24.13, 24.13, 23.45, 23.45, 23.45, 23.45, 23.45, 23.74, 23.74, 23.74, 23.74, 23.74, 24.2, 24.2, 24.2, 24.2, 24.2, 25.05, 25.05, 25.05, 25.05, 25.05, 25.42, 25.42, 25.42, 25.42, 25.42, 25.4, 25.4, 25.4, 25.4, 25.4, 25.23, 25.23, 25.23, 25.23, 25.23, 24.65, 24.65, 24.65, 24.65, 24.65, 24.12, 24.12, 24.12, 24.12, 24.12, 24.02, 24.02, 24.02, 24.02, 24.02, 23.78, 23.78, 23.78, 23.78, 23.78, 23.53, 23.53, 23.53, 23.53, 23.53, 23.32, 23.32, 23.32, 23.32, 23.32, 22.51, 22.51, 22.51, 22.51, 22.51, 22.55, 22.55, 22.55, 22.55, 22.55, 22.62, 22.62, 22.62, 22.62, 22.62, 22.69, 22.69, 22.69, 22.69, 22.69, 22.45, 22.45, 22.45, 22.45, 22.45, 22.44, 22.44, 22.44, 22.44, 22.44, 22.29, 22.29, 22.29, 22.29, 22.29, 22.02, 22.02, 22.02, 22.02, 22.02, 21.85, 21.85, 21.85, 21.85, 21.85, 21.99, 21.99, 21.99, 21.99, 21.99, 22.09, 22.09, 22.09, 22.09, 22.09, 21.91, 21.91, 21.91, 21.91, 21.91, 21.98, 21.98, 21.98, 21.98, 21.98, 22.14, 22.14, 22.14, 22.14, 22.14, 22.2, 22.2, 22.2, 22.2, 22.2, 22.08, 22.08, 22.08, 22.08, 22.08, 22.07, 22.07, 22.07, 22.07, 22.07, 22.26, 22.26, 22.26, 22.26, 22.26, 22.38, 22.38, 22.38, 22.38, 22.38, 22.47, 22.47, 22.47, 22.47, 22.47, 22.51, 22.51, 22.51, 22.51, 22.51, 22.57, 22.57, 22.57, 22.57, 22.57, 22.68, 22.68, 22.68, 22.68, 22.68, 22.63, 22.63, 22.63, 22.63, 22.63, 22.6, 22.6, 22.6, 22.6, 22.6, 22.55, 22.55, 22.55, 22.55, 22.55, 22.4, 22.4, 22.4, 22.4, 22.4, 22.47, 22.47, 22.47, 22.47, 22.47, 22.59, 22.59, 22.59, 22.59, 22.59, 22.72, 22.72, 22.72, 22.72, 22.72, 22.8, 22.8, 22.8, 22.8, 22.8, 22.81, 22.81, 22.81, 22.81, 22.81, 22.75, 22.75, 22.75, 22.75, 22.75, 22.67, 22.67, 22.67, 22.67, 22.67, 22.55, 22.55, 22.55, 22.55, 22.55, 21.97, 21.97, 21.97, 21.97, 21.97, 21.96, 21.96, 21.96, 21.96, 21.96, 21.26, 21.26, 21.26, 21.26, 21.26, 21.06, 21.06, 21.06, 21.06, 21.06, 21.09, 21.09, 21.09, 21.09, 21.09, 21.15, 21.15, 21.15, 21.15, 21.15, 21.18, 21.18, 21.18, 21.18, 21.18, 21.26, 21.26, 21.26, 21.26, 21.26, 21.37]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 447 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1713301053 --> 1713301683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.17, 0.17, 0.17, 0.17, 0.17, 0.32, 0.32, 0.32, 0.32, 0.32, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.27, 0.27, 0.27, 0.27, 0.27, 0.23, 0.23, 0.23, 0.23, 0.23, 0.28, 0.28, 0.28, 0.28, 0.28, 0.26, 0.26, 0.26, 0.26, 0.26, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.39, 0.39, 0.39, 0.39, 0.39, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.24, 0.24, 0.24, 0.24, 0.24, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.31, 0.31, 0.31, 0.31, 0.31, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.22, 0.22, 0.22, 0.22, 0.22, 0.17, 0.17, 0.17, 0.17, 0.17, 0.29, 0.29, 0.29, 0.29, 0.29, 0.1, 0.1, 0.1, 0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.1, 0.1, 0.1, 0.1, 0.1, 0.3, 0.3, 0.3, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4, 0.4, 0.45, 0.45, 0.45, 0.45, 0.45, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.46, 0.46, 0.46, 0.46, 0.46, 0.31, 0.31, 0.31, 0.31, 0.31, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.1, 0.1, 0.1, 0.1, 0.1, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.23]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 447 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1713301053 --> 1713301683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0]
                    
Loading

tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants