[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots

### Your current environment

_No response_

### Model Input Dumps

_No response_

### 🐛 Describe the bug


In the current implementation of `MambaCacheManager._assign_seq_id_to_cache_index`, if `cur_id` is not amongst the finished requests, it will try to pop a `free_cache_index`.
- However, it seems there might be an edge case where the `_assign_seq_id_to_cache_index` tries to aggressively pop free indices before `_release_finished_requests` has a change to return them

We have some private experiments involving mamba that we reuse the above `MambaCacheManager` implementation, but we have  observed errors like below

```
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/jamba.py", line 441, in forward
    ) = self.mamba_cache.current_run_tensors(input_ids, attn_metadata,
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 54, in current_run_tensors
    state_indices = self._prepare_current_run_mamba_cache(
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 144, in _prepare_current_run_mamba_cache
    return [
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 145, in <listcomp>
    self._assign_seq_id_to_cache_index(req_id, seq_id,
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 119, in _assign_seq_id_to_cache_index
    destination_index = self.free_cache_indices.pop()
IndexError: pop from empty list
```

which suggests the issue being diagnosed above. 

We have made sure that we initialize `MambaCacheManager` will have `max_batch_size` equal to `scheduler_config.max_num_seqs`, which we have set it 10 times as large as our batch_size. We use around 8 scheduler steps.

Question: But how can we be sure that the cache occupancy will never exceed `max_batch_size`?

CC: @nelsonspbr 

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots #10693

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots #10693

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions