You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just updated my GPU from a 2080 to a 3090 and man does it makes things go brrrr lol.
Anyways, I notice a new strange behavor when I did. Instead of model + GPU taking close to what the model took in system ram... it now takes almost double the system ram. When offloading from say 8 to 100 using the model wizardLM-13B-Uncensored.ggmlv3.q4_0.bin I jump from 6-7 GB to almost 12 to 14 GB on system RAM - even more as I increase the number of GPU layers. I was under the impression that more GPU_Layers the less system memory it should be using not more?
While I have the RAM for it - just seems very very strange it should be taking even more system ram than ever before.
While not 100% related, I could be just simply doing something wrong with the settings, I had another issue where when I did offload to the GPU when I had my 2080 - things were slow. The fix for it was to increase the batch_size and that did improve the performance even just on 8 layers. Changing the batch in this case doesnt seem to change much for memory usage only the gpu_layers seem to be the issue. #27 As noted here, I dont seem to get "out of memory" errors when I increase the GPU layers - it will jsut "oom" if I go past too many layers for my GPU VRAM relying on the system threads instead.
ctransformers 0.2.10
Windows 11
3090
CUDA supported
Python 3.10
32GB of RAM
RAM Usage after load + message | system ram before loading | difference
threads = 8,
CPU only: 14.0 - 7.3 = 7GB
A little more testing I see it scales up to about an extra 5GB of data for the system RAM before it caps out increasing a little bit per layer between 1-50. Almost seems like it not releasing the "work load" that it was planning on sending to GPU.
The text was updated successfully, but these errors were encountered:
There have been many changes done to llama.cpp in the past few weeks, so hopefully this issue should be resolved now.
Please try with the latest version and if you are still facing an issue, feel free to re-open.
Just updated my GPU from a 2080 to a 3090 and man does it makes things go brrrr lol.
Anyways, I notice a new strange behavor when I did. Instead of model + GPU taking close to what the model took in system ram... it now takes almost double the system ram. When offloading from say 8 to 100 using the model wizardLM-13B-Uncensored.ggmlv3.q4_0.bin I jump from 6-7 GB to almost 12 to 14 GB on system RAM - even more as I increase the number of GPU layers. I was under the impression that more GPU_Layers the less system memory it should be using not more?
While I have the RAM for it - just seems very very strange it should be taking even more system ram than ever before.
While not 100% related, I could be just simply doing something wrong with the settings, I had another issue where when I did offload to the GPU when I had my 2080 - things were slow. The fix for it was to increase the batch_size and that did improve the performance even just on 8 layers. Changing the batch in this case doesnt seem to change much for memory usage only the gpu_layers seem to be the issue.
#27 As noted here, I dont seem to get "out of memory" errors when I increase the GPU layers - it will jsut "oom" if I go past too many layers for my GPU VRAM relying on the system threads instead.
ctransformers 0.2.10
Windows 11
3090
CUDA supported
Python 3.10
32GB of RAM
RAM Usage after load + message | system ram before loading | difference
threads = 8,
CPU only: 14.0 - 7.3 = 7GB
threads = 1, gpu_layers = 50,
1T + GPU: 20.9 - 7.3 = 13GB
A little more testing I see it scales up to about an extra 5GB of data for the system RAM before it caps out increasing a little bit per layer between 1-50. Almost seems like it not releasing the "work load" that it was planning on sending to GPU.
The text was updated successfully, but these errors were encountered: