-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Closed
Labels
installationInstallation problemsInstallation problems
Description
Your current environment
I am using the latest vLLM image on Kubernetes, and I'm facing an issue where the "Loading safetensors checkpoint shards" process runs every time I restart the pod, even when the model is already cached locally. This leads to significant delays as the system repeatedly loads the safetensors from the cache, even though they should already be available for immediate use.
Steps to Reproduce:
Deploy the vLLM image on Kubernetes.
Specify the following command arguments:
args:
- --model
- /root/.cache/huggingface/hub/model70B/snapshot/28/
- --gpu-memory-utilization
- "0.9"
- --tensor-parallel-size
- "8"
- --enforce-eager
- --load-format
- safetensors
- --max-parallel-loading-workers
- "8"
- --max-num-batched-tokens
- "1024"
- --block-size
- "64"
- --max-seq-len-to-capture
- "2048"
Restart the pod.
Observe that the model checkpoint shards are reloaded, even though the model is already cached at /root/.cache/huggingface/hub/model70B/snapshot/28/.
Expected Behavior:
Once the model has been downloaded and cached, it should not need to reload the checkpoint shards each time the pod restarts, as they are already stored in the cache. The system should ideally recognize the cached files and load them directly without repeating the loading process.
Actual Behavior:
Even after the model is cached, the system loads the safetensors checkpoint shards again, resulting in significant delays.
Additional Information:
I have verified that the model files are indeed present in the specified cache directory.
The issue seems to only occur when restarting the pod, and it may be related to how the system detects or accesses the cached files after a pod restart.
How you are installing vllm
Environment:
Kubernetes with the latest vLLM image.
Using the /root/.cache/huggingface/hub/model70B/snapshot/28/ path for the cached model.Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
installationInstallation problemsInstallation problems