Commit 386817b
[Model Runner][Performance] Cache the jugement result of is_encoder_decoder to decrease framework overhead (vllm-project#138)
In Model Runner, is_encoder_decoder is exacted from model_config to
determin whether vllm is running for enc-dec models. Obtaining this
status requires a long call stack, and the CPU overhead is high. So this
PR cache this status in __init__ of ModelInputForNPUBuilder.
Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>1 parent d21b3be commit 386817b
1 file changed
+3
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
353 | 353 | | |
354 | 354 | | |
355 | 355 | | |
| 356 | + | |
356 | 357 | | |
357 | 358 | | |
358 | 359 | | |
| |||
423 | 424 | | |
424 | 425 | | |
425 | 426 | | |
426 | | - | |
| 427 | + | |
427 | 428 | | |
428 | 429 | | |
429 | 430 | | |
| |||
560 | 561 | | |
561 | 562 | | |
562 | 563 | | |
563 | | - | |
| 564 | + | |
564 | 565 | | |
565 | 566 | | |
566 | 567 | | |
| |||
0 commit comments