Inability to use capabilities of dGPU, CLBlast(Old CPU) + other suggestions. #1272

Luro223 · 2024-12-18T01:51:31Z

Hello dear developer of KoboldAI CPP.

I've been using 1.79.1 since release and I have i3-3120M with HD Graphics 4000 and HD 8700M, since 1.79.1 I finally managed to use Vulkan for HD 8700M, it is a bit faster now, but still I can't use dGPU's capabilities, KoboldAI CPP still only uses CPU, it only uses small amount of GPU's memory, also it crashes if I use more than 0 GPU layers on Vulkan.

CLBlast worked earlier with HD Graphics 4000, but with 1.7X it stopped working, AMD Accelerated Parallel Processing never actually worked with CLBlast.

Using --sdvaeauto slightly increases performance, I'll show the results later.
Edit: --sdvaeauto slightly increases performance but in rare cases, I need to test more.

Also is it possible for you to add toggle-able DRY, XTC, Temperature, Top-P, Top-K, either from command-line interface or GUI? I don't use any of these with some models, because, I think, some of them might affect performance, even if not used (0 value), especially with XTC.

Luro223 · 2024-12-18T05:18:04Z

Comparsion of Vulkan and Old CPU with 1K context.
Some performance boost, yet no dGPU utilization.
.....Vulkan, HD8700M.....
Processing Prompt (924 / 924 tokens)
Generating (100 / 100 tokens)
[00:00:00] CtxLimit:1024/1024, Amt:100/100, Init:0.10s, Process:378.21s (409.3ms/T = 2.44T/s), Generate:50.80s (508.0ms/T = 1.97T/s), Total:429.01s (0.23T/s)
Benchmark Completed - v1.79.1 Results:
.....
Flags: NoAVX2=True Threads=4 HighPriority=False Cublas_Args=None Tensor_Split=None BlasThreads=4 BlasBatchSize=-1 FlashAttention=True KvCache=2
Timestamp: 2024-12-18 00:00:00.000000+00:00
Backend: koboldcpp_vulkan_noavx2.dll
Layers: 0
Model: mistral7b-erebus-v3.Q4_K_M
MaxCtx: 1024
GenAmount: 100
.....
ProcessingTime: 378.215s
ProcessingSpeed: 2.44T/s
GenerationTime: 50.795s
GenerationSpeed: 1.97T/s
TotalTime: 429.010s
Output: 1 1 1 1
.....
.....CPU (Old CPU).....
Processing Prompt (924 / 924 tokens)
Generating (100 / 100 tokens)
[00:00:00] CtxLimit:1024/1024, Amt:100/100, Init:0.09s, Process:387.69s (419.6ms/T = 2.38T/s), Generate:53.23s (532.3ms/T = 1.88T/s), Total:440.93s (0.23T/s)
Benchmark Completed - v1.79.1 Results:
.....
Flags: NoAVX2=True Threads=4 HighPriority=False Cublas_Args=None Tensor_Split=None BlasThreads=4 BlasBatchSize=-1 FlashAttention=True KvCache=2
Timestamp: 2024-12-18 00:00:00.000000+00:00
Backend: koboldcpp_noavx2.dll
Layers: 0
Model: mistral7b-erebus-v3.Q4_K_M
MaxCtx: 1024
GenAmount: 100
.....
ProcessingTime: 387.695s
ProcessingSpeed: 2.38T/s
GenerationTime: 53.235s
GenerationSpeed: 1.88T/s
TotalTime: 440.930s
Output: 1 1 1 1
.....

LostRuins · 2024-12-18T08:25:24Z

CLBlast with 0 layers doesnt work at all?

Luro223 · 2024-12-18T12:13:41Z

CLBlast with 0 layers doesnt work at all?

Yeah, absolutely, neither with HD Graphics 4000 or AMD Accelerated Parallel Processing(Oland), though it worked with versions earlier than 1.7X, but only with HD Graphics 4000.

Luro223 · 2024-12-18T12:30:04Z

CLBlast(Old CPU) with 0 layers(1.79.1)
Loading model: L:\AI-Models\mistral7b-erebus-v3.Q4_K_M.gguf
Traceback (most recent call last):
File "koboldcpp.py", line 5009, in
main(parser.parse_args(),start_server=True)
File "koboldcpp.py", line 4630, in main
loadok = load_model(modelname)
File "koboldcpp.py", line 930, in load_model
ret = handle.load_model(inputs)
OSError: [WinError -1073741795] Windows Error 0xc000001d
[776] Failed to execute script 'koboldcpp' due to unhandled exception!

Luro223 · 2024-12-18T12:30:34Z

Same error with Oland.
If windbg or System Informer is enough to dump error, then I could use one of them and send dmp to you.

Luro223 · 2024-12-18T13:20:25Z

Also if it's true or not, I noticed 1.79.1 is less creative compared to 1.69.1 is it because of DRY or XTC, I don't know, but the outputs are always different compared to 1.69.1, even if I disable DRY and XTC.
For example 1.69.1 gives more creative responses, but 1.79.1 gives more logic responses, even with DRY/XTC 0 and different models. I can reproduce this and show you if you're interested.

LostRuins · 2024-12-20T06:01:41Z

The 2 versions shouldnt have any difference in creativity. 1.80 is just released, you can try that.

Luro223 · 2024-12-20T17:25:58Z

After a comprehensive testing of 1.80 the same errors, but noticeable performance boost(especially with longer contexts).
1.79.1 VS 1.80 (Vulkan-HD8700M)
Vulkan-HD8700M-1.79.1.txt
Vulkan-HD8700M-1.80.0.txt
CLBlast 1.69.1 VS 1.80.0 (to prove that 1.80.0 still doesn't work) (Also I've noticed 1.7X-1.80 uses koboldcpp_clblast.dll, while 1.69.1 uses koboldcpp_clblast_noavx2.dll, even though 1.80.0 has this library, yet didn't see it being used)
CLBlast (Old CPU)-I3-3120M.txt
CLBlast (Old CPU)-Oland.txt
CLBlast NoAVX2 (Old CPU)-I3-3120M(1.69.1).txt

Luro223 · 2024-12-20T17:39:58Z

Also to clarify all confusions with Creativity, I'll provide all fine-tuned custom settings for UI and saved Character Card, so you'll able to reproduce all the errors related to creativity
UI Settings :
set-005.zip
Test Character Cards (GENERATED RESULTS) :
1.69.1,1.80.zip
Test Character Card (CLEAN TO GENERATE) :
Kirby-clean.zip
Main LLM I used for all this :
L3-8B-Stheno-v3.2-NEO-V1-D_AU-Q4_K_M-imat13 Link

Luro223 · 2024-12-20T17:42:57Z

All the errors is happening when you try to change DRY, even if Mult./Base/A.Len 0, and cannot be disabled either from UI or from command-line arguments, even others like XTC, TOP K and etc.
For example all the settings related to DRY are 0, everything generated will be weird.

Luro223 · 2024-12-20T18:19:03Z

Also GPU Utilization is same 0% with 1.80, only about ~40MB of GPU is used, and 100% of CPU :

Also other presets perform very well too (Old Cpu), and I already showed the comparsion of VULKAN:
1.79.1.txt
1.80.0.txt
(Only 256 context, but with bigger contexts works better).

Luro223 · 2024-12-20T19:59:48Z

Also Vulkan with more than 0 layers crashes :
Vulkan-1Layers.txt

LostRuins · 2024-12-21T00:57:24Z

It will only use noavx2.dll if you selected "old cpu" option. If you have avx2 support you should not use that!

Luro223 · 2024-12-21T11:25:58Z

It will only use noavx2.dll if you selected "old cpu" option. If you have avx2 support you should not use that!

Then what other variant do I have? CPU (Old CPU) Works, but CLBlast (Old CPU) doesn't, it uses koboldcpp_clblast.dll, even with --noavx2 --nommap --usecpu flags, koboldcpp_clblast_noavx2.dll library is completely unusable.

Luro223 · 2024-12-21T11:54:11Z

If I delete koboldcpp_clblast.dll and rename koboldcpp_clblast_noavx2.dll to koboldcpp_clblast.dll it works surprisingly well(Only used 256 context for testing):
1.80_CLBlast_noavx2-FORCED.txt
1.80_CLBlast_noavx2-FORCED-BENCHMARK.txt(Comparsion between CLblast(Intel OpenCL-HD Graphics 4000) and CPU (Old CPU))
1.80_CLBlast_noavx2-FORCED-BENCHMARK-Intel(R) HD Graphics 4000.txt(Intel(R) HD Graphics 4000 only)
1.80_CLBlast_noavx2-FORCED-BENCHMARK-Intel(R) Core(TM) i3-3120M CPU @ 2.50GHz.txt(i3-3120M only)
(Mostly because I have 2 intel OpenCL devices, one uses directly (called Intel(R) Core(TM) i3-3120M CPU @ 2.50GHz), and one from driver (Intel(R) HD Graphics 4000).
Also Use CPU (Old CPU) is a bit faster than Intel's OpenCL, but OpenCL works faster with longer contexts.
Also Oland with CLblast still doesn't work, gives absolutely identical error as 1.69.1(cl_khr_f16 (not supported)):
1.80_CLBlast_noavx2-FORCED-Oland.txt

Luro223 · 2024-12-21T13:06:57Z

I've tested CLblast with forced library(koboldcpp_clblast_noavx2.dll), and it's a bit faster than Vulkan, which proves 0% GPU utilization and how faster it is with direct usage of OpenCL when Intel(R) Core(TM) i3-3120M CPU @ 2.50GHz is being selected:
CLblast-HD Graphics 4000-1.80.0.txt
CLblast-Intel(R) Core(TM) i3-3120M CPU @ 2.50GHz-1.80.0.txt

Luro223 · 2024-12-21T21:43:30Z

1.80.1 - same errors, and CLblast still works, but only with library replacement:
CLblast-Intel(R) Core(TM) i3-3120M CPU @ 2.50GHz-1.80.1.txt

Luro223 · 2024-12-21T21:51:31Z

@LostRuins tell me any useful tools to debug errors for you to see the exact problem, because KoboldAI's own UI option 'Debug Mode' provides not enough to see the exact problem.
Also 1.80.1 still gives completely different output with different creativity but a bit better logic, and if I change DRY to 0 it will output complete nonsense.

LostRuins · 2024-12-22T05:07:14Z

Looking at your logs, I can see that the noavx2 flag is not being set at all (hence why its not being used)

Also it looks like you set blasbatchsize to -1, which disables batch processing.

You might want to check your launch parameters to make sure --noavx2 has been set, either in CLI, or by selecting it in the launcher.

Luro223 · 2024-12-22T21:57:02Z

@LostRuins So, you've completely ignored all the messages answering exactly same things I'll type right now. I launced even with --showgui --noavx2 it still uses koboldcpp_clblast.dll instead of koboldcpp_clblast_noavx2.dll.
noavx2 is still False even with --showgui --noavx2 :
CLblast-Intel(R) Core(TM) i3-3120M CPU @ 2.50GHz-1.80.1-NOAVX2.txt
Even with --showgui --noavx2 --nommap --usecpu :
CLblast-Intel(R) Core(TM) i3-3120M CPU @ 2.50GHz-1.80.1-NOAVX2-NOMMAP-USECPU.txt
Even if I launch without my predefined config for UI.

Luro223 · 2024-12-22T22:08:34Z

If KoboldCPP GUI uses noavx2=false, even with flags, then it's the issue from GUI itself. Still works if I replace library.
But it works from terminal without any library replacement :
CLblast-Intel(R) Core(TM) i3-3120M CPU @ 2.50GHz-1.80.1-NOAVX2-Terminal.txt

LostRuins · 2024-12-23T02:34:32Z

No, i am not ignoring your messages. I'm saying the behavior when you run with --noavx2 will indeed load the noavx2 library

The txt file you are sending me does not seem to match the command lines you have sent. Somehow, you seem to be running a benchmark? Are you loading another config file by mistake? That will override the flags you set.

Luro223 · 2024-12-23T13:22:08Z

@LostRuins No. Even with noavx2=false Vulkan (Old CPU) will use koboldcpp_vulkan_noavx2.dll, but with CLBlast (Old CPU), it will use koboldcpp_clblast.dll, even if I use --noavx2. It will ONLY use koboldcpp_clblast_noavx2.dll if I run directly from terminal without actually using any UI.
And NO, even with --noavx2 flag, the UI will still use koboldcpp_clblast.dll, EVEN if I run without my config. But if you've checked my config earlier, my config contains "noavx2": true, but with/without this same config, it still uses koboldcpp_clblast.dll.

Luro223 · 2024-12-23T13:24:28Z

@LostRuins You can directly reproduce this error if you run with these flags : --showgui --noavx2
It'll use koboldcpp_clblast.dll in any way if you try to run from UI, even with --noavx2 flag.

Luro223 · 2024-12-23T13:37:55Z

@LostRuins WITHOUT config.
Here we go again.txt
Used --usecl 1 0 --show --noavx2(Only loaded model and started, no config) :

Luro223 · 2024-12-23T13:38:56Z

Still koboldcpp_clblast.dll, ALWAYS koboldcpp_clblast.dll with UI.

Luro223 · 2024-12-23T13:45:09Z

@LostRuins Also can you add "Break" button to forcefully stop prompt from browser's frontend?
Sometimes it will keep generating, even if I press stop, ignoring everything until "Processing Prompt" is fully completed.

LostRuins · 2024-12-23T13:58:09Z

Okay I think I see the bug. I will do a new build

Luro223 · 2024-12-23T15:49:01Z

@LostRuins Awesome, what do you think about adding toggle-able functions like DRY, XTC and etc. for UI and/or flags, because with some models I don't use such things like DRY, XTC, Top-K, Top-P, Temperature, etc., and disabling some of them might increase performance, especially on low-end machines.

Also what can I do to provide you with enough info to fix the inability of GPU utilization for my Vulkan device?

Luro223 · 2024-12-23T16:09:15Z

As I still can't use my Vulkan device, even with 1.80.1, it uses only about ~20MB. of GPU memory, but still uses CPU only as I mentioned earlier.

LostRuins · 2024-12-23T16:33:38Z

@Luro223 fix is up, please try latest version 1.80.3

LostRuins · 2024-12-23T17:14:14Z

Meanwhile, what error does vulkan give you when you try to use it with offloaded layers

Luro223 · 2024-12-23T17:20:59Z

@LostRuins Thanks, CLblast works with terminal, as well with custom settings too:
1.80.3-Fix-CLblast.txt
1.80.3-Fix-CLblast-Terminal.txt
And Oland still gives same errors:
1.80.3-CLblast-Oland.txt

Luro223 · 2024-12-23T17:23:33Z

Meanwhile, what error does vulkan give you when you try to use it with offloaded layers

Edit: Sorry, the errors same as attached from 1.80.1 :
1.80.1-Vulkan-1Layers.txt
access violation writing 0x0000000000001000

Luro223 · 2024-12-23T17:34:04Z

@LostRuins More layers - same errors.

Luro223 · 2024-12-24T18:58:13Z

@LostRuins Any news? Or maybe additional tools for me to debug more info from these errors?

LostRuins · 2024-12-25T03:17:49Z

You need to disable quantized KV cache. It's not supported with Vulkan.

Luro223 · 2024-12-25T23:11:43Z

@LostRuins After some tests I noticed that GPU kinda works with KV off, but max GPU utilization was 64%, and with KV2 it's significantly faster than Vulkan, even with 5 layers(Crashes with more layers.).
Vulkan, 5 layers, FlashAttention off, ContextShift off:
1.80.3-Vulkan-5Layers-NoKV.txt
CLblast (Intel(R) Core(TM) i3-3120M CPU @ 2.50GHz) with 0 layers, FlashAttention on, KvCache=2, ContextShift off:
1.80.3-CLblast-0Layers-KV2.txt
So, is there any way to make KobolAI use 100% GPU utilization instead of 64%?
Also ~1055MB out of 2048MB is being used from dedicated graphics memory.
Please tell me if I missed out something important.

LostRuins · 2024-12-26T03:49:00Z

Why are you using BlasBatchSize = -1? That basically negates the prompt processing speedup of the GPU.

Luro223 · 2024-12-27T02:46:12Z

Why are you using BlasBatchSize = -1? That basically negates the prompt processing speedup of the GPU.

Yeah it worked, with test 256 context size I got <60Sec. instead of 100. but, after experimenting with blas it crashed midway by blas 512, and now I can't run with any blas settings, only no blas with 0 layers work(will show later):
1.80.3-Vulkan0-blas512.txt

Luro223 · 2024-12-27T03:32:42Z

Another error, similar to first one, but crashed even with blas 256. And as same with previous one, the Vulkan device becomes completely unusable, even with full dGPU driver reload, only full reboot helps:
1.80.3-Vulkan0-blas256.txt

Luro223 · 2024-12-27T03:56:27Z

Also tested with 1layer no blas, and 0layers blas32, only 0layers and no blas works after error(midway crash):
1.80.3-Vulkan0-0layers-Blas32.txt
1.80.3-Vulkan0-1layer-noBlas.txt

Inability to use capabilities of dGPU, CLBlast(Old CPU) + other suggestions. #1272

Inability to use capabilities of dGPU, CLBlast(Old CPU) + other suggestions. #1272

Comments

Luro223 commented Dec 18, 2024 • edited Loading

Luro223 commented Dec 18, 2024 • edited Loading

LostRuins commented Dec 18, 2024

Luro223 commented Dec 18, 2024 • edited Loading

Luro223 commented Dec 18, 2024 • edited Loading

Luro223 commented Dec 18, 2024 • edited Loading

Luro223 commented Dec 18, 2024 • edited Loading

LostRuins commented Dec 20, 2024

Luro223 commented Dec 20, 2024 • edited Loading

Luro223 commented Dec 20, 2024

Luro223 commented Dec 20, 2024 • edited Loading

Luro223 commented Dec 20, 2024

Luro223 commented Dec 20, 2024

LostRuins commented Dec 21, 2024

Luro223 commented Dec 21, 2024 • edited Loading

Luro223 commented Dec 21, 2024 • edited Loading

Luro223 commented Dec 21, 2024 • edited Loading

Luro223 commented Dec 21, 2024 • edited Loading

Luro223 commented Dec 21, 2024 • edited Loading

LostRuins commented Dec 22, 2024

Luro223 commented Dec 22, 2024 • edited Loading

Luro223 commented Dec 22, 2024 • edited Loading

LostRuins commented Dec 23, 2024 • edited Loading

Luro223 commented Dec 23, 2024

Luro223 commented Dec 23, 2024

Luro223 commented Dec 23, 2024

Luro223 commented Dec 23, 2024

Luro223 commented Dec 23, 2024 • edited Loading

LostRuins commented Dec 23, 2024

Luro223 commented Dec 23, 2024 • edited Loading

Luro223 commented Dec 23, 2024

LostRuins commented Dec 23, 2024

LostRuins commented Dec 23, 2024

Luro223 commented Dec 23, 2024

Luro223 commented Dec 23, 2024 • edited Loading

Luro223 commented Dec 23, 2024 • edited Loading

Luro223 commented Dec 24, 2024 • edited Loading

LostRuins commented Dec 25, 2024

Luro223 commented Dec 25, 2024 • edited Loading

LostRuins commented Dec 26, 2024

Luro223 commented Dec 27, 2024 • edited Loading

Luro223 commented Dec 27, 2024

Luro223 commented Dec 27, 2024 • edited Loading

Luro223 commented Dec 18, 2024 •

edited

Loading

Luro223 commented Dec 18, 2024 •

edited

Loading

Luro223 commented Dec 18, 2024 •

edited

Loading

Luro223 commented Dec 18, 2024 •

edited

Loading

Luro223 commented Dec 18, 2024 •

edited

Loading

Luro223 commented Dec 18, 2024 •

edited

Loading

Luro223 commented Dec 20, 2024 •

edited

Loading

Luro223 commented Dec 20, 2024 •

edited

Loading

Luro223 commented Dec 21, 2024 •

edited

Loading

Luro223 commented Dec 21, 2024 •

edited

Loading

Luro223 commented Dec 21, 2024 •

edited

Loading

Luro223 commented Dec 21, 2024 •

edited

Loading

Luro223 commented Dec 21, 2024 •

edited

Loading

Luro223 commented Dec 22, 2024 •

edited

Loading

Luro223 commented Dec 22, 2024 •

edited

Loading

LostRuins commented Dec 23, 2024 •

edited

Loading

Luro223 commented Dec 23, 2024 •

edited

Loading

Luro223 commented Dec 23, 2024 •

edited

Loading

Luro223 commented Dec 23, 2024 •

edited

Loading

Luro223 commented Dec 23, 2024 •

edited

Loading

Luro223 commented Dec 24, 2024 •

edited

Loading

Luro223 commented Dec 25, 2024 •

edited

Loading

Luro223 commented Dec 27, 2024 •

edited

Loading

Luro223 commented Dec 27, 2024 •

edited

Loading