Skip to content

[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode#5555

Merged
yuanlehome merged 1 commit into
PaddlePaddle:developfrom
wuyujiji:yuzhe_dev
Dec 18, 2025
Merged

[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode#5555
yuanlehome merged 1 commit into
PaddlePaddle:developfrom
wuyujiji:yuzhe_dev

Conversation

@wuyujiji
Copy link
Copy Markdown
Contributor

@wuyujiji wuyujiji commented Dec 15, 2025

Motivation

为了适配paddleocr-vl模型,特在天数硬件上支持V1_KVCACHE_SCHEDULER和paddle ocr vl的rope模式。除此之外,还验证了打开V1_KVCACHE_SCHEDULER后,之前适配的ERNIE纯文模型和ERNIE VL模型系列精度均正常

Modifications

Pass

Usage or Command

Pass

Accuracy Tests

Pass

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Dec 15, 2025

Thanks for your contribution!

@paddle-bot paddle-bot Bot added the contributor External developers label Dec 15, 2025
@wuyujiji wuyujiji force-pushed the yuzhe_dev branch 3 times, most recently from 718baaa to 1ffcbda Compare December 15, 2025 08:34
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 15, 2025

Codecov Report

❌ Patch coverage is 10.52632% with 34 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@404cf0e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...executor/layers/attention/iluvatar_attn_backend.py 8.33% 22 Missing ⚠️
...del_executor/models/ernie4_5_vl/ernie4_5_vl_moe.py 20.00% 2 Missing and 2 partials ⚠️
...model_executor/models/paddleocr_vl/paddleocr_vl.py 20.00% 2 Missing and 2 partials ⚠️
fastdeploy/engine/sched/resource_manager_v1.py 0.00% 2 Missing ⚠️
fastdeploy/engine/args_utils.py 0.00% 0 Missing and 1 partial ⚠️
fastdeploy/worker/worker_process.py 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5555   +/-   ##
==========================================
  Coverage           ?   63.80%           
==========================================
  Files              ?      329           
  Lines              ?    41743           
  Branches           ?     6386           
==========================================
  Hits               ?    26636           
  Misses             ?    13081           
  Partials           ?     2026           
Flag Coverage Δ
GPU 63.80% <10.52%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevincheng2
Copy link
Copy Markdown
Collaborator

天数支持多模请求的多batch嘛?当前v1里边都是放开的,可能要关注一下

@wuyujiji
Copy link
Copy Markdown
Contributor Author

天数支持多模请求的多batch嘛?当前v1里边都是放开的,可能要关注一下

@kevincheng2 应该是支持的,有多batch的脚本吗,我可以测一下

Copy link
Copy Markdown
Collaborator

@Jiang-Jia-Jun Jiang-Jia-Jun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuanlehome yuanlehome merged commit ac01380 into PaddlePaddle:develop Dec 18, 2025
21 of 24 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for V1_KVCACHE_SCHEDULER and PaddleOCR-VL rope mode on Iluvatar hardware. The changes enable the V1 KV cache scheduler on Iluvatar devices and implement a new rope mode specifically for PaddleOCR-VL models while maintaining backward compatibility with ERNIE text and VL model series.

Key Changes:

  • Refactored timeout mechanism in tests using signal-based approach
  • Updated dependency versions (paddleformers 0.4.0, paddle packages to dev20251103/20251107)
  • Extended V1_KVCACHE_SCHEDULER support to Iluvatar platform
  • Modified rope embedding handling to support both interleaved and non-interleaved modes
  • Added new custom operators for V1 scheduler support (update_inputs_v1, recover_decode_task, get_img_boundaries)

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
tests/ci_use/iluvatar_UT/utils.py New utility module with signal-based timeout decorator
tests/ci_use/iluvatar_UT/*.py Refactored tests to use centralized timeout utility, updated expected outputs
tests/ci_use/iluvatar_UT/bench_gsm8k.py New benchmark script for GSM8K dataset evaluation
scripts/run_ci_iluvatar.sh Improved CI script with better error logging
requirements_iluvatar.txt Updated paddleformers version to 0.4.0
fastdeploy/worker/*.py Enabled V1 scheduler for Iluvatar, adjusted rope embedding logic
fastdeploy/model_executor/layers/attention/iluvatar_attn_backend.py Major refactoring of rope embedding handling for batch processing
fastdeploy/model_executor/ops/iluvatar/paged_attention.py Added rope_batch_stride and is_interleaved_rope_mode parameters
fastdeploy/model_executor/models//.py Added transpose operations for mixed attention mode
custom_ops/setup_ops.py Added new operator source files to build
custom_ops/iluvatar_ops/*.cu Updated attention kernels with rope mode support and batch stride
custom_ops/gpu_ops/get_padding_offset.cu Fixed warp size for Iluvatar (64 vs 32)
docs/**/*.md Extensive documentation updates for Iluvatar setup and model deployment
.github/workflows/ci_iluvatar.yml Updated Docker image and runner configuration

Comment thread fastdeploy/model_executor/layers/attention/iluvatar_attn_backend.py
Comment thread fastdeploy/model_executor/layers/attention/iluvatar_attn_backend.py
Comment thread custom_ops/iluvatar_ops/prefill_fused_attn.cu
Comment thread custom_ops/iluvatar_ops/prefill_fused_attn.cu
Comment thread custom_ops/iluvatar_ops/mixed_fused_attn.cu
Comment thread tests/ci_use/iluvatar_UT/utils.py
Comment thread tests/ci_use/iluvatar_UT/bench_gsm8k.py
Comment thread tests/ci_use/iluvatar_UT/bench_gsm8k.py
Comment thread custom_ops/gpu_ops/get_padding_offset.cu
Comment thread fastdeploy/worker/iluvatar_model_runner.py
chang-wenbin pushed a commit to chang-wenbin/FastDeploy that referenced this pull request Mar 2, 2026
xiaoguoguo626807 pushed a commit to xiaoguoguo626807/FastDeploy that referenced this pull request May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants