[Performance]: Performance Bottleneck in Mooncake PD Disaggregation: tensorhash() and safetensor_save() Overhead

### Proposal to improve performance

Hi team,

I've been conducting performance tests on vllm PD Disaggregation using mooncake_store_connector, and found that the most time-consuming parts are not the actual put() operations, but rather:
- [tensorhash()](https://github.com/vllm-project/vllm/blob/b6553be1bc75f046b00046a4ad7576364d03c835/vllm/distributed/kv_transfer/kv_connector/mooncake_store_connector.py#L198)
- [safetensor_save()](https://github.com/vllm-project/vllm/blob/b6553be1bc75f046b00046a4ad7576364d03c835/vllm/distributed/kv_transfer/kv_lookup_buffer/mooncake_store.py#L131)

Based on profiling traces, these two steps dominate the runtime during PD disaggregation, more than the actual storage or network transmission:
![Image](https://github.com/user-attachments/assets/320e80c2-976e-4ff5-9fd4-ff65ecf3ba83)

**Observations:**

tensorhash() seems to repeatedly compute SHA256 hashes over possibly large tensors.
safetensor_save() is used per tensor and appears to serialize, which is expensive when invoked frequently.

**Questions:**

Maybe we could parallelize the hash computation using multithreading?
Is there any alternatives for safetensor_save()?

Thanks!

### Report of performance regression

_No response_

### Misc discussion on performance

_No response_

### Your current environment (if you think it is necessary)

```text
The output of `python collect_env.py`
```


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: Performance Bottleneck in Mooncake PD Disaggregation: tensorhash() and safetensor_save() Overhead #20009

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: Performance Bottleneck in Mooncake PD Disaggregation: tensorhash() and safetensor_save() Overhead #20009

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions