-
Notifications
You must be signed in to change notification settings - Fork 698
Pull requests: InternLM/lmdeploy
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[WIP]: Support reuse routed experts on eviction
#4599
opened May 19, 2026 by
RunningLeon
Collaborator
Loading…
Extend v1/messages by introducing token-in/out and returning routed experts
improvement
#4597
opened May 19, 2026 by
lvhan028
Collaborator
Loading…
Extend chat completions by introducing token-in/out and returning routed experts
improvement
#4593
opened May 18, 2026 by
lvhan028
Collaborator
Loading…
fix: enable FA3 for SM80+ GPUs and fix CUDA version comparison
Bug:P1
#4591
opened May 18, 2026 by
windreamer
Collaborator
Loading…
1 of 4 tasks
fix(pytorch): offload guided decoding CPU ops to thread pool to prevent event loop blocking
improvement
#4590
opened May 18, 2026 by
windreamer
Collaborator
Loading…
3 of 4 tasks
docs(advance): add Add a New Speculative Decoding Method guide
documentation
Improvements or additions to documentation
#4589
opened May 17, 2026 by
SuperMarioYL
Loading…
4 tasks done
tool calling alignment with openai's spec
improvement
#4585
opened May 13, 2026 by
lvhan028
Collaborator
Loading…
Add OpenAI Responses-compatible endpoint
enhancement
New feature or request
#4582
opened May 13, 2026 by
CUHKSZzxy
Collaborator
Loading…
[security] fix(proxy): require auth for node management
#4579
opened May 11, 2026 by
Hinotoi-agent
Loading…
5 of 9 tasks
Fix health latency under concurrent VL request preparation
Bug:P0
#4570
opened May 7, 2026 by
CUHKSZzxy
Collaborator
Loading…
FP8 kv cache quantization
enhancement
New feature or request
#4563
opened Apr 29, 2026 by
CUHKSZzxy
Collaborator
Loading…
[Feature] Add guided decoding support for speculative decoding
enhancement
New feature or request
#4559
opened Apr 28, 2026 by
windreamer
Collaborator
Loading…
4 tasks done
Test: update sleep/wakeup and abort scenarios
#4528
opened Apr 15, 2026 by
littlegy
Contributor
Loading…
style: add autopep8 pre-commit hook and apply PEP 8 formatting fixes
#4524
opened Apr 14, 2026 by
windreamer
Collaborator
Loading…
make fp8 model quantized by llm-compressor can be inferenced in turbomind
enhancement
New feature or request
#4509
opened Apr 8, 2026 by
43758726
Collaborator
Loading…
Integrate deep-ep nccl backend
enhancement
New feature or request
#4477
opened Mar 27, 2026 by
irexyc
Collaborator
Loading…
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.