What's Changed
🚀 Features
- feature: support qwen2.5 fuction_call by @akai-shuuichi in #2737
- [Feature] support minicpm-v_2_6 for pytorch engine. by @Reinerzhou in #2767
- Support qwen2-vl AWQ quantization by @AllentDan in #2787
- Add DeepSeek-V2 support by @lzhangzz in #2763
- [ascend]feat: support kv int8 by @yao-fengchen in #2736
💥 Improvements
- Optimize update_step_ctx on Ascend by @jinminxi104 in #2804
- Add Ascend installation adapter by @zhabuye in #2817
- Refactor turbomind (2/N) by @lzhangzz in #2818
- add openssh-server installation in dockerfile by @lvhan028 in #2830
- Add version restrictions in runtime_ascend.txt to ensure functionality by @zhabuye in #2836
- better kv allocate by @grimoire in #2814
- Update internvl chat template by @AllentDan in #2832
- profile throughput without new threads by @grimoire in #2826
- [dlinfer] change dlinfer kv_cache layout and ajust paged_prefill_attention api. by @Reinerzhou in #2847
- [maca] add env to support different mm layout on maca. by @Reinerzhou in #2835
- Supports W8A8 quantization for more models by @AllentDan in #2850
🐞 Bug fixes
- disable prefix-caching for vl model by @grimoire in #2825
- Fix gemma2 accuracy through the correct softcapping logic by @AllentDan in #2842
- fix accessing before initialization by @lvhan028 in #2845
- fix the logic to verify whether AutoAWQ has been successfully installed by @grimoire in #2844
- check whether backend_config is None or not before accessing its attr by @lvhan028 in #2848
- [ascend] convert kv cache to nd format in ascend graph mode by @tangzhiyi11 in #2853
📚 Documentations
- Update supported models & Ascend doc by @jinminxi104 in #2765
- update supported models by @lvhan028 in #2849
🌐 Other
- [CI] Split vl testcases into turbomind and pytorch backend by @zhulinJulia24 in #2751
- [dlinfer] Fix qwenvl rope error for dlinfer backend by @JackWeiw in #2795
- [CI] add more testcase for mllm models by @zhulinJulia24 in #2791
- Update dlinfer-ascend version in runtime_ascend.txt by @jinminxi104 in #2865
- bump version to v0.6.4 by @lvhan028 in #2864
New Contributors
- @akai-shuuichi made their first contribution in #2737
- @JackWeiw made their first contribution in #2795
- @zhabuye made their first contribution in #2817
Full Changelog: v0.6.3...v0.6.4