[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode#5555
Conversation
|
Thanks for your contribution! |
718baaa to
1ffcbda
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5555 +/- ##
==========================================
Coverage ? 63.80%
==========================================
Files ? 329
Lines ? 41743
Branches ? 6386
==========================================
Hits ? 26636
Misses ? 13081
Partials ? 2026
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
天数支持多模请求的多batch嘛?当前v1里边都是放开的,可能要关注一下 |
@kevincheng2 应该是支持的,有多batch的脚本吗,我可以测一下 |
daf578d to
9129f48
Compare
There was a problem hiding this comment.
Pull request overview
This pull request adds support for V1_KVCACHE_SCHEDULER and PaddleOCR-VL rope mode on Iluvatar hardware. The changes enable the V1 KV cache scheduler on Iluvatar devices and implement a new rope mode specifically for PaddleOCR-VL models while maintaining backward compatibility with ERNIE text and VL model series.
Key Changes:
- Refactored timeout mechanism in tests using signal-based approach
- Updated dependency versions (paddleformers 0.4.0, paddle packages to dev20251103/20251107)
- Extended V1_KVCACHE_SCHEDULER support to Iluvatar platform
- Modified rope embedding handling to support both interleaved and non-interleaved modes
- Added new custom operators for V1 scheduler support (update_inputs_v1, recover_decode_task, get_img_boundaries)
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/ci_use/iluvatar_UT/utils.py | New utility module with signal-based timeout decorator |
| tests/ci_use/iluvatar_UT/*.py | Refactored tests to use centralized timeout utility, updated expected outputs |
| tests/ci_use/iluvatar_UT/bench_gsm8k.py | New benchmark script for GSM8K dataset evaluation |
| scripts/run_ci_iluvatar.sh | Improved CI script with better error logging |
| requirements_iluvatar.txt | Updated paddleformers version to 0.4.0 |
| fastdeploy/worker/*.py | Enabled V1 scheduler for Iluvatar, adjusted rope embedding logic |
| fastdeploy/model_executor/layers/attention/iluvatar_attn_backend.py | Major refactoring of rope embedding handling for batch processing |
| fastdeploy/model_executor/ops/iluvatar/paged_attention.py | Added rope_batch_stride and is_interleaved_rope_mode parameters |
| fastdeploy/model_executor/models//.py | Added transpose operations for mixed attention mode |
| custom_ops/setup_ops.py | Added new operator source files to build |
| custom_ops/iluvatar_ops/*.cu | Updated attention kernels with rope mode support and batch stride |
| custom_ops/gpu_ops/get_padding_offset.cu | Fixed warp size for Iluvatar (64 vs 32) |
| docs/**/*.md | Extensive documentation updates for Iluvatar setup and model deployment |
| .github/workflows/ci_iluvatar.yml | Updated Docker image and runner configuration |
Motivation
为了适配paddleocr-vl模型,特在天数硬件上支持V1_KVCACHE_SCHEDULER和paddle ocr vl的rope模式。除此之外,还验证了打开V1_KVCACHE_SCHEDULER后,之前适配的ERNIE纯文模型和ERNIE VL模型系列精度均正常
Modifications
Pass
Usage or Command
Pass
Accuracy Tests
Pass
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.