Commit ff3a7c8
committed
Optimization (decode): treat KV slot exhaustion (code 1) as a recoverable return value
- Updated the `decode` wrapper to explicitly return `1` instead of raising a `RuntimeError` when `llama_decode` indicates no KV slots are available.
- Aligned Python API behavior with the underlying C++ contract, treating code 1 as a recoverable signal rather than a fatal crash.
- Enabled upper-level caller loops (like `eval`) to gracefully handle VRAM fragmentation via dynamic batch halving without relying on clumsy try-except block string parsing.
- Retained strict `RuntimeError` exceptions for truly fatal backend failures (e.g., codes -1, -2, -3).
- Added comprehensive docstrings detailing return codes and exception scenarios.
Signed-off-by: JamePeng <jame_peng@sina.com>1 parent 1a5b3d6 commit ff3a7c8
1 file changed
+30
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
529 | 529 | | |
530 | 530 | | |
531 | 531 | | |
532 | | - | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
533 | 549 | | |
534 | 550 | | |
535 | 551 | | |
536 | | - | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
537 | 559 | | |
| 560 | + | |
538 | 561 | | |
539 | | - | |
540 | | - | |
541 | | - | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
542 | 566 | | |
543 | 567 | | |
544 | | - | |
| 568 | + | |
545 | 569 | | |
546 | 570 | | |
547 | 571 | | |
| |||
0 commit comments