Release LMDeploy Release v0.6.4 · InternLM/lmdeploy

What's Changed

Optimize update_step_ctx on Ascend by @jinminxi104 in #2804
Add Ascend installation adapter by @zhabuye in #2817
Refactor turbomind (2/N) by @lzhangzz in #2818
add openssh-server installation in dockerfile by @lvhan028 in #2830
Add version restrictions in runtime_ascend.txt to ensure functionality by @zhabuye in #2836
better kv allocate by @grimoire in #2814
Update internvl chat template by @AllentDan in #2832
profile throughput without new threads by @grimoire in #2826
[dlinfer] change dlinfer kv_cache layout and ajust paged_prefill_attention api. by @Reinerzhou in #2847
[maca] add env to support different mm layout on maca. by @Reinerzhou in #2835
Supports W8A8 quantization for more models by @AllentDan in #2850

disable prefix-caching for vl model by @grimoire in #2825
Fix gemma2 accuracy through the correct softcapping logic by @AllentDan in #2842
fix accessing before initialization by @lvhan028 in #2845
fix the logic to verify whether AutoAWQ has been successfully installed by @grimoire in #2844
check whether backend_config is None or not before accessing its attr by @lvhan028 in #2848
[ascend] convert kv cache to nd format in ascend graph mode by @tangzhiyi11 in #2853

[CI] Split vl testcases into turbomind and pytorch backend by @zhulinJulia24 in #2751
[dlinfer] Fix qwenvl rope error for dlinfer backend by @JackWeiw in #2795
[CI] add more testcase for mllm models by @zhulinJulia24 in #2791
Update dlinfer-ascend version in runtime_ascend.txt by @jinminxi104 in #2865
bump version to v0.6.4 by @lvhan028 in #2864

Full Changelog: v0.6.3...v0.6.4