-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhandled Exception trying to use Vulkan #644
Comments
Does it work if you select 0 layers? Right now there are a few known issue with vulkan:
|
No doesn't work, just tested it with 0 layers as requested. CMD output for 0 layers
|
So, something odd is going on. The above four tests were done by changing the inference back-end variable in the GUI launcher of a previously saved set of setting from prior to 1.56 that was set for OpenCL with a custom rope config, a rope config that I'm realising might not have been correctly setup (I don't think it was actually extending the context at all...). anyway, I just unchecked the custom rope config setting in the GUI when using that same settings file and it's now launching properly. maybe there is something wrong with my rope config that vulkan didn't like and wasn't gracefully handling unlike openCL hence line 328 in koboldcpp.py being pointed to in the traceback. |
Okay, just to clarify, I'm not sure if it's specific to the NeuralChat model. Could you try a known good model with Vulkan like this one here : https://huggingface.co/TheBloke/airoboros-mistral2.2-7B-GGUF/blob/main/airoboros-mistral2.2-7b.Q2_K.gguf Edit: Yeah or it could be the rope config |
Airoboros-mistral2.2-7b.Q4_K_M works fine, using the same setting with custom rope config disabled. |
scratch that, somethings very not right still? shut down kobold and restarted it, and it's gone back to failing as above with both of those models I tried earlier reporting the same error. I think either that report of functionality was a fluke or I dreamed I had it set for vulkan. regardless, I've tested using vulkan to load the smallest GGUF I have 'TinyMistral-248m_Q8' which did successfully load considering it's only 258,000kb and is outputting precisely what I'd expect of it (absolute garbage considering it's not really that well trained) but it did load. I think this might be that Segfault issue. Edit: I will add that the OSError that is thrown is apparently a known open issue with llama-cpp-python bindings? |
Nah the OSError is just a generic message. Could be anything. If it works with the other backends then it could be some issue relating to the vulkan implementation. Since its still an early work in progress probably best to just try it again next version. |
Just tested with the Phi-2 model you recommended; it's repeating itself badly under vulkan but outputs coherently under openCL. looks like I'll probably have to wait for the next revision as you concluded. C'est la vie as they say. |
But it doesn't crash? That's the important part. So whatever issues you were facing were model specific rather than device specific. Phi2 incoherence is a known issue on vulkan - ggerganov#2059 (comment) |
It didn't crash five hours ago and was incoherent as said when I ran it. just repeated some the tests with Phi-2 Q4_K_M which failed each time just now in full offload, partial offload and zero offload states. the cause of this crashing is that the automatic rope config was engaging as I had thoughtlessly set the context to 16k. when I turned the context length down to 2k, Phi-2 Q4_K_M loaded as expected and was incoherent as expected with all GPU offload strategies tested. Failed Phi-2 Launchthis was at my normal requested context length of 16k (derived from the normal settings for NeuralBeagle14 with openCL)
Successful but incoherent Phi-2 Launchthis was after I adjusted the context length in the quick start settings to 2k
After doing the above two tests, I decided to attempt the same with NeuralBeagle14-7b, and reduced the context to 4k. this resulted in NeuralBeagle14-7b working as expected and coherently. NeuralBeagle14-7B failure
NeuralBeagle14-7b success
I should add that I tried with a 6k context, and KoboldCPP crashed in the same fashion as every other failure before it. I suspect that this might be a manifestation of that silent SegFault you mentioned much earlier, so this might help you and Occ4m? |
I am inclined to think that it's crashing simply because it is unable to allocate enough memory, or running OOM, and there's no indication that that is the reason. Regardless, I think that if it works fine on smaller contexts but not larger, you can try offloading fewer layers first. |
It seems it is silently OOMing as you've hypothesised. I'm sitting at 7.2GB of VRAM usage in Vulkan with 33 layers, 4k context whilst with openCL I'm sitting at 5.9GB of VRAM usage with 33 layers and 16k context. reducing the number of layers allows for increasing the context size variable up to a maximum of 12k with 0 layers on the GPU. When trying for 16k context with 33 layers in VRAM in Vulkan and watching the VRAM usage chart in Task manager, it shows a sudden spike to roughly 6GB of VRAM before crashing but no doubt it's spiking higher for a moment, causing the OOM. I think this might be rather open and shut now in conclusion. So, thank you for your help LostRuins... I kind of feel this has been an unhelpful wild goose chase for an already known issue :/ |
On my phone it said exactly that but on my Windows it just says "Windows Error 0xc00000ff". lol It works with: |
Should be fixed now in v1.57, if it fails because of OOM it should now announce it instead of crashing silently. |
I'm not sure if this is the correct place to post this issue as it could be an upstream issue but here's hoping
Hardware used
CPU: Ryzen 5 5600G
GPU: RX6600XT (Driver Version: 23.30.13.01-231128a-398226C-AMD-Software-Adrenalin-Edition)
RAM: 47.9GB of DDR4 at 2133MHz
Motherboard: Gigabyte B450M Aorus Elite
Hopefully the above is useful? but the below should absolutely be useful. I tried with a full offload as below using my normal specification of 41 layers, tried a second time with only 10 layers specified, the tried a third time with 33 layers specified ie equal to that of the actual model's number of layers. something odd I did notice is that
CMD output
The text was updated successfully, but these errors were encountered: