[Installation]: Model checkpoint shards reloading every time on Kubernetes with vLLM image (even if already downloaded)

### Your current environment

```text
 I am using the latest vLLM image on Kubernetes, and I'm facing an issue where the "Loading safetensors checkpoint shards" process runs every time I restart the pod, even when the model is already cached locally. This leads to significant delays as the system repeatedly loads the safetensors from the cache, even though they should already be available for immediate use.

Steps to Reproduce:
Deploy the vLLM image on Kubernetes.

Specify the following command arguments:

args:
  - --model
  - /root/.cache/huggingface/hub/model70B/snapshot/28/
  - --gpu-memory-utilization
  - "0.9"
  - --tensor-parallel-size
  - "8"
  - --enforce-eager
  - --load-format
  - safetensors
  - --max-parallel-loading-workers
  - "8"
  - --max-num-batched-tokens
  - "1024"
  - --block-size
  - "64"
  - --max-seq-len-to-capture
  - "2048"

Restart the pod.

Observe that the model checkpoint shards are reloaded, even though the model is already cached at /root/.cache/huggingface/hub/model70B/snapshot/28/.

Expected Behavior:
Once the model has been downloaded and cached, it should not need to reload the checkpoint shards each time the pod restarts, as they are already stored in the cache. The system should ideally recognize the cached files and load them directly without repeating the loading process.

Actual Behavior:
Even after the model is cached, the system loads the safetensors checkpoint shards again, resulting in significant delays.
Additional Information:
I have verified that the model files are indeed present in the specified cache directory.

The issue seems to only occur when restarting the pod, and it may be related to how the system detects or accesses the cached files after a pod restart.
```


### How you are installing vllm

```sh
Environment:
Kubernetes with the latest vLLM image.

Using the /root/.cache/huggingface/hub/model70B/snapshot/28/ path for the cached model.
```


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Installation]: Model checkpoint shards reloading every time on Kubernetes with vLLM image (even if already downloaded) #15862

Your current environment

How you are installing vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Installation]: Model checkpoint shards reloading every time on Kubernetes with vLLM image (even if already downloaded) #15862

Description

Your current environment

How you are installing vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions