[Performance]: Move hash_request_tokens computation from input request threads

### Proposal to improve performance

Currently, [hash_request_tokens](https://github.com/vllm-project/vllm/blob/d1fb65bde367aa6e3d72520c84b60be3d1539917/vllm/v1/core/kv_cache_utils.py#L500C5-L500C24) executes in engine core to compute hashes of blocks based on the request token IDs (and lora IDs, MM tokens, etc). And the current design make it to become the hard blocker of inferences.

As shown in the following charts, for small models opt128m with QPS 200 (input=700, output=1) scenarios, noticeable amount of time is used compute the hash.
<img width="1540" height="538" alt="Image" src="https://github.com/user-attachments/assets/a992a9d2-5b2a-45c2-9cfa-f2d41f95af52" />

Ideally, in order to compute the hashes, all dependent metadata should be ready when the data received on input_socket processing threads who is running in parallel with engine core thread. With this move, we would be able to move the hashes computation out from critical path, as shown in the following chart.
<img width="1556" height="819" alt="Image" src="https://github.com/user-attachments/assets/f8a61a27-8885-460c-9159-188ebf246cc7" />

### Report of performance regression

N/A

### Misc discussion on performance

N/A

### Your current environment (if you think it is necessary)

```text
The output of `python collect_env.py`
```
N/A

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: Move hash_request_tokens computation from input request threads #21247

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: Move hash_request_tokens computation from input request threads #21247

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions