Skip to content

[Installation]: Model checkpoint shards reloading every time on Kubernetes with vLLM image (even if already downloaded) #15862

@Prashantsaini25

Description

@Prashantsaini25

Your current environment

 I am using the latest vLLM image on Kubernetes, and I'm facing an issue where the "Loading safetensors checkpoint shards" process runs every time I restart the pod, even when the model is already cached locally. This leads to significant delays as the system repeatedly loads the safetensors from the cache, even though they should already be available for immediate use.

Steps to Reproduce:
Deploy the vLLM image on Kubernetes.

Specify the following command arguments:

args:
  - --model
  - /root/.cache/huggingface/hub/model70B/snapshot/28/
  - --gpu-memory-utilization
  - "0.9"
  - --tensor-parallel-size
  - "8"
  - --enforce-eager
  - --load-format
  - safetensors
  - --max-parallel-loading-workers
  - "8"
  - --max-num-batched-tokens
  - "1024"
  - --block-size
  - "64"
  - --max-seq-len-to-capture
  - "2048"

Restart the pod.

Observe that the model checkpoint shards are reloaded, even though the model is already cached at /root/.cache/huggingface/hub/model70B/snapshot/28/.

Expected Behavior:
Once the model has been downloaded and cached, it should not need to reload the checkpoint shards each time the pod restarts, as they are already stored in the cache. The system should ideally recognize the cached files and load them directly without repeating the loading process.

Actual Behavior:
Even after the model is cached, the system loads the safetensors checkpoint shards again, resulting in significant delays.
Additional Information:
I have verified that the model files are indeed present in the specified cache directory.

The issue seems to only occur when restarting the pod, and it may be related to how the system detects or accesses the cached files after a pod restart.

How you are installing vllm

Environment:
Kubernetes with the latest vLLM image.

Using the /root/.cache/huggingface/hub/model70B/snapshot/28/ path for the cached model.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions