Fix logic error in prepare_inputs_for_generation cache slicing condition (#41764)

albertvillanova · zucchini-nlp · web-flow · commit f30c22500b12 · 2025-11-11T16:52:38.000Z
Fix logic error in cache slicing condition

Co-authored-by: Raushan Turganbay &lt;raushan@huggingface.co&gt;
diff --git a/src/transformers/generation/utils.py b/src/transformers/generation/utils.py
@@ -608,7 +608,7 @@ def prepare_inputs_for_generation(
         use_cache = kwargs.get("use_cache")
         if use_cache is None:
             use_cache = getattr(self.config, "use_cache", False)
-        if past_key_values is None or use_cache:
+        if past_key_values is not None or use_cache:
             # TODO (joao): handle the case where cache length == input_ids length. The function below results in an
             # exception because we get empty input_ids after slicing. In essence, we need to roll back the cache 1
             # token to recompute the logits for the first token to be generated (but not all caches support roll backs)