Commit a26e777
authored
Update deepseek modeling file for PatchedVLLMKVCache (#1009)
Previously, when we use INC to convert deepseek FP8 model, we need this
[commit
](intel/neural-compressor@7c0a3e2)
to remove extra converts in KVCache but actually GC can remove them
during graph optimization theoretically.
Furthermore, the change in commit is not aligned with the design of INC
patched module, which wants to keep the returned tensor BF16 because we
can't make sure users' next operation.
So, I update the modeling file to make GC can work for patched KVCache
pattern of deepseek model.
Since next release is very close and GC currently can not work as
expection during decode stage, it is still a workround. We will root
cause and fix it from source in next relase.
This PR should work together with this PR:
intel/neural-compressor#2165
Signed-off-by: Mengni Wang <mengni.wang@intel.com>1 parent 109ac5d commit a26e777
1 file changed
+4
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
430 | 430 | | |
431 | 431 | | |
432 | 432 | | |
433 | | - | |
434 | | - | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
435 | 436 | | |
436 | 437 | | |
437 | 438 | | |
438 | 439 | | |
439 | 440 | | |
440 | | - | |
441 | 441 | | |
442 | 442 | | |
443 | 443 | | |
444 | 444 | | |
445 | 445 | | |
446 | | - | |
| 446 | + | |
447 | 447 | | |
448 | 448 | | |
449 | 449 | | |
| |||
0 commit comments