Controlling Continuous GPU Memory Allocation via Environment Variable #13350
Closed
thomasbergersen
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Can there be an environment variable (e.g., export LLAMA_CUDA_FORCE_MAX_MEM=8192) to enforce contiguous GPU memory allocation? On Ada 6000 , if llama-server is launched first, VLLM throws an OOM error, but if VLLM is initialized first, the OOM does not occur.
Beta Was this translation helpful? Give feedback.
All reactions