Skip to content

[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots #10693

@fabianlim

Description

@fabianlim

Your current environment

No response

Model Input Dumps

No response

🐛 Describe the bug

In the current implementation of MambaCacheManager._assign_seq_id_to_cache_index, if cur_id is not amongst the finished requests, it will try to pop a free_cache_index.

  • However, it seems there might be an edge case where the _assign_seq_id_to_cache_index tries to aggressively pop free indices before _release_finished_requests has a change to return them

We have some private experiments involving mamba that we reuse the above MambaCacheManager implementation, but we have observed errors like below

  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/jamba.py", line 441, in forward
    ) = self.mamba_cache.current_run_tensors(input_ids, attn_metadata,
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 54, in current_run_tensors
    state_indices = self._prepare_current_run_mamba_cache(
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 144, in _prepare_current_run_mamba_cache
    return [
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 145, in <listcomp>
    self._assign_seq_id_to_cache_index(req_id, seq_id,
  File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 119, in _assign_seq_id_to_cache_index
    destination_index = self.free_cache_indices.pop()
IndexError: pop from empty list

which suggests the issue being diagnosed above.

We have made sure that we initialize MambaCacheManager will have max_batch_size equal to scheduler_config.max_num_seqs, which we have set it 10 times as large as our batch_size. We use around 8 scheduler steps.

Question: But how can we be sure that the cache occupancy will never exceed max_batch_size?

CC: @nelsonspbr

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions