Pulse · ModelTC/LightLLM · GitHub

August 8, 2025 – September 8, 2025

Overview

41 Active pull requests

5 Active issues

1 Release published by 1 person

v1.1.0 LightLLM v1.1.0 Release!
published Sep 3, 2025

34 Pull requests merged by 9 people

vit fa3 api fix
#1047 merged Sep 8, 2025
[fix]fix fp8 bug when load moe model
#1045 merged Sep 8, 2025
add stream_options for openai api
#1046 merged Sep 8, 2025
fix mtp mem alloc in overlap manner
#1044 merged Sep 8, 2025
force to warmup triton autotune configs in start.
#1043 merged Sep 5, 2025
fix tl.where warning
#1041 merged Sep 4, 2025
v100 triton kernel fix
#1040 merged Sep 3, 2025
LightLLM v1.1.0 release!
#1039 merged Sep 3, 2025
add qwen235b autotune config
#1038 merged Sep 3, 2025
fix autotune and benchmark
#1037 merged Sep 3, 2025
group deepgemm update api
#1035 merged Sep 3, 2025
fix benchmark
#1033 merged Sep 3, 2025
tuning optimization
#1032 merged Sep 2, 2025
Add setproctitle
#1024 merged Sep 2, 2025
add AutotuneLevel for more detailed autotune
#1031 merged Sep 1, 2025
fix autotuning warmup length
#1028 merged Aug 29, 2025
Autotuner
#1020 merged Aug 28, 2025
fix input_penalty token_id async update bug.
#1022 merged Aug 25, 2025
Dp balancer
#991 merged Aug 25, 2025
fix check_recommended_shm_size
#1021 merged Aug 22, 2025
add greedy_sample
#1019 merged Aug 22, 2025
fix mtp static bench
#1009 merged Aug 21, 2025
support more PD node select func. such as random or roundrobin.
#1018 merged Aug 21, 2025
feat: support more PD node select func
#970 merged Aug 21, 2025
Add multimodal token usage
#1016 merged Aug 21, 2025
Add multimodal token usage
#1011 merged Aug 21, 2025
feat: add stop string matching
#969 merged Aug 20, 2025
Fix dynamic_prompt_cache for chunked prefill
#1010 merged Aug 20, 2025
deepseek && qwen tp performance tuning
#934 merged Aug 20, 2025
Fix the overflow issue caused by the mem index type being int32 in the decode att operator.
#1013 merged Aug 20, 2025
[Misc] Add a progress bar when loading the model
#1008 merged Aug 20, 2025
Add shm size check
#978 merged Aug 18, 2025
[opt]opti-qwen2-vl-vit
#1004 merged Aug 14, 2025
Fix error illegal memory access when max_total_token_num is too large
#998 merged Aug 12, 2025

7 Pull requests opened by 6 people

add fa3_mtp
#1005 opened Aug 11, 2025
[support] vit and llm disaggregation
#1014 opened Aug 20, 2025
Optimize multimodal resource allocation with concurrency and improved batch RPC
#1017 opened Aug 21, 2025
Add Support For GPT-OSS Model
#1023 opened Aug 27, 2025
Use environment variable for RMSNORM_WARPS
#1027 opened Aug 29, 2025
Mineru adapt
#1034 opened Sep 3, 2025
pd with nixl backend (rebase main)
#1042 opened Sep 4, 2025

4 Issues closed by 4 people

[BUG]LLVM ERROR: Failed to compute parent layout for slice layout.
#1030 closed Sep 3, 2025
V100 has compute capability sm70 while Feature 'cvt.bf16.f32' requires .target sm_80 or higher
#1029 closed Aug 30, 2025
Question about tp
#1006 closed Aug 22, 2025
[Feature] Openai GPT OSS Support
#1012 closed Aug 20, 2025

1 Issue opened by 1 person

where can I find `lightllm_constraint_decode_kernel`?
#1007 opened Aug 15, 2025

5 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Multimodal improve
#951 commented on Aug 13, 2025 • 0 new comments
Fp8 deepseek
#975 commented on Sep 4, 2025 • 0 new comments
Asynchicache
#977 commented on Aug 22, 2025 • 0 new comments
Disk cache and cpu Cache feature
#997 commented on Aug 13, 2025 • 0 new comments
Support Qwen models' dp>1 in PD
#999 commented on Aug 28, 2025 • 0 new comments