[BugFix] fix speculate_limit_thinking_content_length#5590
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
This PR removes an unused seq_lens_decoder parameter from the speculate_limit_thinking_content_length family of functions, which are used to limit thinking content length during speculative decoding.
Key Changes:
- Removed
seq_lens_decoderparameter from v1 and v2 CUDA kernel functions and their wrappers - Updated all test cases to remove the unused parameter
- Removed redundant sequence length adjustment logic (now only
step_idxis adjusted)
Critical Issue Found: The function call in post_process_specualate is missing required parameters after the removal.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
tests/operators/test_speculate_limit_thinking_content_length.py |
Removed seq_lens_decoder parameter from all test cases for both v1 and v2 functions; updated comments to reflect removal |
fastdeploy/model_executor/pre_and_post_process.py |
Removed seq_lens_decoder parameter from function signatures and internal calls; bug introduced in call at line 444 |
custom_ops/gpu_ops/speculate_decoding/speculate_limit_thinking_content_length_v1.cu |
Removed seq_lens_decoder from kernel, wrapper function, and operator input declaration; removed redundant adjustment logic |
custom_ops/gpu_ops/speculate_decoding/speculate_limit_thinking_content_length_v2.cu |
Removed seq_lens_decoder from kernel, wrapper function, and operator input declaration; removed redundant adjustment logic |
Note on PR Description: The PR description is incomplete. The "Motivation" and "Modifications" sections are empty. Please add:
- Motivation: Explain that
seq_lens_decoderwas redundant because it was being updated in the same way asstep_idx, and the actual sequence length update happens in thespeculate_updatefunction that is called after this function. - Modifications: Detail that this PR removes the unused
seq_lens_decoderparameter from thespeculate_limit_thinking_content_length_v1andv2functions across CUDA kernels, Python wrappers, and test files.
| accept_num=share_inputs["accept_num"], | ||
| seq_lens_decoder=share_inputs["seq_lens_decoder"], | ||
| think_end_id=think_end_id, | ||
| line_break_id=line_break_id, |
There was a problem hiding this comment.
Missing required parameters in function call. The function speculate_limit_thinking_content_length requires stop_flags and eos_token_ids parameters (as seen in the function signature on line 147-148), but they are not provided in this call. After removing seq_lens_decoder, you should add:
stop_flags=share_inputs["stop_flags"],after line 450eos_token_ids=share_inputs["eos_token_ids"],after stop_flags (only needed for the "" strategy, but the function signature requires it)
This will cause a runtime error when this function is called.
| line_break_id=line_break_id, | |
| line_break_id=line_break_id, | |
| stop_flags=share_inputs["stop_flags"], | |
| eos_token_ids=share_inputs["eos_token_ids"], |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5590 +/- ##
==========================================
Coverage ? 61.93%
==========================================
Files ? 328
Lines ? 41134
Branches ? 6270
==========================================
Hits ? 25476
Misses ? 13743
Partials ? 1915
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… (#5591) * fix bug * fix bug --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
* fix speculate_limit_thinking_content_length * update
* fix speculate_limit_thinking_content_length * update
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.