-
Notifications
You must be signed in to change notification settings - Fork 262
Description
Feature Request
Now, CI will pass mamba1 and only test prefill and backward of mamba2.
There is a lack of a dedicated, model-level test for stateful, autoregressive generation using the cache mechanism (cache_params). The current CI setup could allow regressions in the generation-specific logic to go unnoticed.
A dedicated generation test is needed to verify:
Correct Cache Initialization: The Mamba2Cache object is created and populated correctly during the prefill stage.
Stateful Updates: The convolutional states (conv_states) and SSM states (ssm_states) are updated correctly after each token is generated.
Numerical Consistency: The output logits produced by the model in step-by-step generation mode (using the cache) are numerically consistent with the logits produced by a full, non-cached forward pass on the same sequence.
Motivation
N/A
Your Contribution
N/A