-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Few questions / issues #27
Comments
|
Thanks! The fix seem to be increasing the batch_size from default to 256. I was able to set t=1 then and get more performance. 7-8 or 10-12 having more thread seems to improve performance than degrade it. I'm also not getting an out of memory error - it just "ooms" or stops trying to use the GPU and starts slowing down after a bit more testing and playing with the numbers. In this case, it still takes up the VRAM - but seems like it just doesnt do anything with the GPU at that point. |
A) Windows 11, 32 GB of RAM, Ryzen 5800x, 13B-HyperMantis.ggmlv3.q5_1.bin, set to LLAMA, 12 Threads (give or take what I set here for setting reasons), RTX 2080
B) Install the model the other day. Tested on the CPU was able to get results back under 20 to 25 seconds. Saw there was GPU suported, uninstall and reinstall with CUDA support. Tested the GPU off loading and it didnt seem to do much in my first round of testing. I had set it to 25 layers at the time. Didnt see any improvement in speed, but could see that the GPU was being used with higher memory access and GPU usage spiking, but never capping at max. Lower the count to 15 layers. Tested again. This time was able to hit 5 to 10. Went crazy and tested it as much as I could getting really good results. Today, I rebooted my machine and its acting like it did the other day at 25. Tried lower it from 15 to 10 or below ... but it doesnt seem to be using the GPU yet "acting" or "setting" up the usage for the GPU as I can see the memory and inflex of usage - but never fully topping out.
I could be totally using it wrong, but the fact it was working the other day and today it stop tells me something change on my computer, but I honstly couldnt tell you what did. Didnt perform any updates, but its also weird it didnt work before then all at once it did. Not sure if there is some type of support model it needs or not. Cuda is supported under torch check. Any help or inforamtion is welcome:) I understand this is not a common issue. Any places I can check/get values for to see if its really working would be great. Just seems odd.
The text was updated successfully, but these errors were encountered: