Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Disable iqx on windows as WA #6435

Merged
merged 4 commits into from
Apr 3, 2024
Merged

[SYCL] Disable iqx on windows as WA #6435

merged 4 commits into from
Apr 3, 2024

Conversation

airMeng
Copy link
Collaborator

@airMeng airMeng commented Apr 2, 2024

workaround for ollama/ollama#3278

Copy link
Contributor

github-actions bot commented Apr 2, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3: 500 iterations 🚀

  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=9364.1ms p(90)=25949.07ms fails=0, finish reason: stop=500 truncated=0
  • Prompt processing (pp): avg=242.74tk/s p(90)=733.9tk/s total=198.88tk/s
  • Token generation (tg): avg=100.05tk/s p(90)=279.74tk/s total=129.83tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sycl-win-crash commit=d100b7511cc7a92faca3c4e86db5806afb8da747
Time series

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 500 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1712047439 --> 1712048067
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 28.39, 28.39, 28.39, 28.39, 28.39, 314.59, 314.59, 314.59, 314.59, 314.59, 406.83, 406.83, 406.83, 406.83, 406.83, 462.57, 462.57, 462.57, 462.57, 462.57, 511.38, 511.38, 511.38, 511.38, 511.38, 515.2, 515.2, 515.2, 515.2, 515.2, 520.64, 520.64, 520.64, 520.64, 520.64, 548.47, 548.47, 548.47, 548.47, 548.47, 551.54, 551.54, 551.54, 551.54, 551.54, 562.5, 562.5, 562.5, 562.5, 562.5, 566.26, 566.26, 566.26, 566.26, 566.26, 588.59, 588.59, 588.59, 588.59, 588.59, 627.99, 627.99, 627.99, 627.99, 627.99, 618.17, 618.17, 618.17, 618.17, 618.17, 602.57, 602.57, 602.57, 602.57, 602.57, 612.92, 612.92, 612.92, 612.92, 612.92, 615.39, 615.39, 615.39, 615.39, 615.39, 614.39, 614.39, 614.39, 614.39, 614.39, 631.26, 631.26, 631.26, 631.26, 631.26, 632.36, 632.36, 632.36, 632.36, 632.36, 631.38, 631.38, 631.38, 631.38, 631.38, 630.94, 630.94, 630.94, 630.94, 630.94, 635.36, 635.36, 635.36, 635.36, 635.36, 639.04, 639.04, 639.04, 639.04, 639.04, 654.0, 654.0, 654.0, 654.0, 654.0, 654.29, 654.29, 654.29, 654.29, 654.29, 652.47, 652.47, 652.47, 652.47, 652.47, 653.99, 653.99, 653.99, 653.99, 653.99, 662.97, 662.97, 662.97, 662.97, 662.97, 661.23, 661.23, 661.23, 661.23, 661.23, 661.11, 661.11, 661.11, 661.11, 661.11, 664.77, 664.77, 664.77, 664.77, 664.77, 668.26, 668.26, 668.26, 668.26, 668.26, 667.93, 667.93, 667.93, 667.93, 667.93, 670.37, 670.37, 670.37, 670.37, 670.37, 674.8, 674.8, 674.8, 674.8, 674.8, 680.41, 680.41, 680.41, 680.41, 680.41, 690.85, 690.85, 690.85, 690.85, 690.85, 693.28, 693.28, 693.28, 693.28, 693.28, 691.89, 691.89, 691.89, 691.89, 691.89, 691.95, 691.95, 691.95, 691.95, 691.95, 692.66, 692.66, 692.66, 692.66, 692.66, 694.03, 694.03, 694.03, 694.03, 694.03, 701.46, 701.46, 701.46, 701.46, 701.46, 703.37, 703.37, 703.37, 703.37, 703.37, 702.3, 702.3, 702.3, 702.3, 702.3, 699.75, 699.75, 699.75, 699.75, 699.75, 697.16, 697.16, 697.16, 697.16, 697.16, 695.29, 695.29, 695.29, 695.29, 695.29, 693.84, 693.84, 693.84, 693.84, 693.84, 694.23, 694.23, 694.23, 694.23, 694.23, 699.22, 699.22, 699.22, 699.22, 699.22, 699.14, 699.14, 699.14, 699.14, 699.14, 703.75, 703.75, 703.75, 703.75, 703.75, 703.77, 703.77, 703.77, 703.77, 703.77, 700.97, 700.97, 700.97, 700.97, 700.97, 702.76, 702.76, 702.76, 702.76, 702.76, 701.76, 701.76, 701.76, 701.76, 701.76, 701.36, 701.36, 701.36, 701.36, 701.36, 704.34, 704.34, 704.34, 704.34, 704.34, 705.47, 705.47, 705.47, 705.47, 705.47]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 500 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1712047439 --> 1712048067
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.86, 27.86, 27.86, 27.86, 27.86, 16.4, 16.4, 16.4, 16.4, 16.4, 17.25, 17.25, 17.25, 17.25, 17.25, 17.46, 17.46, 17.46, 17.46, 17.46, 18.81, 18.81, 18.81, 18.81, 18.81, 20.09, 20.09, 20.09, 20.09, 20.09, 20.4, 20.4, 20.4, 20.4, 20.4, 20.41, 20.41, 20.41, 20.41, 20.41, 20.48, 20.48, 20.48, 20.48, 20.48, 20.47, 20.47, 20.47, 20.47, 20.47, 20.43, 20.43, 20.43, 20.43, 20.43, 20.37, 20.37, 20.37, 20.37, 20.37, 19.99, 19.99, 19.99, 19.99, 19.99, 19.46, 19.46, 19.46, 19.46, 19.46, 19.0, 19.0, 19.0, 19.0, 19.0, 18.96, 18.96, 18.96, 18.96, 18.96, 19.17, 19.17, 19.17, 19.17, 19.17, 19.15, 19.15, 19.15, 19.15, 19.15, 18.99, 18.99, 18.99, 18.99, 18.99, 18.81, 18.81, 18.81, 18.81, 18.81, 18.68, 18.68, 18.68, 18.68, 18.68, 18.58, 18.58, 18.58, 18.58, 18.58, 18.6, 18.6, 18.6, 18.6, 18.6, 18.67, 18.67, 18.67, 18.67, 18.67, 18.55, 18.55, 18.55, 18.55, 18.55, 18.63, 18.63, 18.63, 18.63, 18.63, 18.73, 18.73, 18.73, 18.73, 18.73, 18.73, 18.73, 18.73, 18.73, 18.73, 18.62, 18.62, 18.62, 18.62, 18.62, 18.54, 18.54, 18.54, 18.54, 18.54, 18.57, 18.57, 18.57, 18.57, 18.57, 18.63, 18.63, 18.63, 18.63, 18.63, 18.66, 18.66, 18.66, 18.66, 18.66, 18.8, 18.8, 18.8, 18.8, 18.8, 18.84, 18.84, 18.84, 18.84, 18.84, 18.77, 18.77, 18.77, 18.77, 18.77, 18.74, 18.74, 18.74, 18.74, 18.74, 18.64, 18.64, 18.64, 18.64, 18.64, 18.58, 18.58, 18.58, 18.58, 18.58, 18.61, 18.61, 18.61, 18.61, 18.61, 18.63, 18.63, 18.63, 18.63, 18.63, 18.65, 18.65, 18.65, 18.65, 18.65, 18.68, 18.68, 18.68, 18.68, 18.68, 18.62, 18.62, 18.62, 18.62, 18.62, 18.56, 18.56, 18.56, 18.56, 18.56, 18.47, 18.47, 18.47, 18.47, 18.47, 18.18, 18.18, 18.18, 18.18, 18.18, 18.02, 18.02, 18.02, 18.02, 18.02, 17.74, 17.74, 17.74, 17.74, 17.74, 17.57, 17.57, 17.57, 17.57, 17.57, 17.52, 17.52, 17.52, 17.52, 17.52, 17.56, 17.56, 17.56, 17.56, 17.56, 17.58, 17.58, 17.58, 17.58, 17.58, 17.62, 17.62, 17.62, 17.62, 17.62, 17.64, 17.64, 17.64, 17.64, 17.64, 17.65, 17.65, 17.65, 17.65, 17.65, 17.64, 17.64, 17.64, 17.64, 17.64, 17.61, 17.61, 17.61, 17.61, 17.61, 17.53, 17.53, 17.53, 17.53, 17.53, 17.56, 17.56, 17.56, 17.56, 17.56]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 500 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1712047439 --> 1712048067
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.01, 0.01, 0.01, 0.01, 0.01, 0.25, 0.25, 0.25, 0.25, 0.25, 0.17, 0.17, 0.17, 0.17, 0.17, 0.08, 0.08, 0.08, 0.08, 0.08, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.23, 0.23, 0.23, 0.23, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.27, 0.27, 0.27, 0.27, 0.27, 0.3, 0.3, 0.3, 0.3, 0.3, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.22, 0.22, 0.22, 0.22, 0.22, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.33, 0.33, 0.33, 0.33, 0.33, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.24, 0.24, 0.24, 0.24, 0.24, 0.19, 0.19, 0.19, 0.19, 0.19, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22, 0.22, 0.22, 0.22, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.28, 0.28, 0.28, 0.28, 0.28, 0.38, 0.38, 0.38, 0.38, 0.38, 0.48, 0.48, 0.48, 0.48, 0.48, 0.56, 0.56, 0.56, 0.56, 0.56, 0.5, 0.5, 0.5, 0.5, 0.5, 0.42, 0.42, 0.42, 0.42, 0.42, 0.27, 0.27, 0.27, 0.27, 0.27, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.31, 0.31, 0.31, 0.31, 0.31, 0.26, 0.26, 0.26, 0.26, 0.26, 0.18, 0.18, 0.18, 0.18, 0.18, 0.07, 0.07, 0.07, 0.07, 0.07, 0.11, 0.11, 0.11, 0.11, 0.11]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 500 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1712047439 --> 1712048067
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0]
                    
Loading

auto kmask_iq2xs_ptr_ct1 = kmask_iq2xs.get_ptr();
auto iq2xxs_grid_ptr_ct1 = &iq2xxs_grid[0];
auto ksigns_iq2xs_ptr_ct1 = &ksigns_iq2xs[0];
auto kmask_iq2xs_ptr_ct1 = &kmask_iq2xs[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove kmask_iq2xs_ptr_ct1 and replaced by kmask_iq2xs in next code.

           auto kmask_iq2xs_ptr_ct1 = &kmask_iq2xs[0];

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's same for other codes, like:
auto kmask_iq2xs_ptr_ct1 = &kmask_iq2xs[0];

remove them.

@airMeng airMeng merged commit 5260486 into master Apr 3, 2024
59 of 60 checks passed
@airMeng airMeng deleted the sycl-win-crash branch April 3, 2024 07:34
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
* disable iqx on windows as WA

* array instead of global_memory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants