File tree Expand file tree Collapse file tree 2 files changed +25
-1
lines changed
Expand file tree Collapse file tree 2 files changed +25
-1
lines changed Original file line number Diff line number Diff line change 11# CHANGELOG
2+ # [ Version v1.8.0] ( https://github.com/intel/xFasterTransformer/releases/tag/v1.8.0 )
3+ v1.8.0 Continuous Batching on Single ARC GPU and AMX_FP16 Support.
4+
5+ # Highlight
6+ - Continuous Batching on Single ARC GPU is supported and can be integrated by ` vllm-xft ` .
7+ - Introduce Intel AMX instructions support for ` float16 ` data type.
8+
9+ # Models
10+ - Support ChatGLM4 series models.
11+ - Introduce BF16/FP16 full path support for Qwen series models.
12+
13+ ## BUG fix
14+ - Fixed memory leak of oneDNN primitive cache.
15+ - Fixed SPR-HBM flat QUAD mode detect issue in benchmark scripts.
16+ - Fixed heads Split error for distributed Grouped-query attention(GQA).
17+ - Fixed an issue with the invokeAttentionLLaMA API.
18+
19+ # [ Version v1.7.3] ( https://github.com/intel/xFasterTransformer/releases/tag/v1.7.3 )
20+ v1.7.3
21+
22+ ## BUG fix
23+ - Fixed SHM reduceAdd & rope error when batch size is large.
24+ - Fixed the issue of abnormal usage of oneDNN primitive cache.
25+
226# [ Version v1.7.2] ( https://github.com/intel/xFasterTransformer/releases/tag/v1.7.2 )
327v1.7.2 - Continuous batching feature supports Qwen 1.0 & hybrid data types.
428
Original file line number Diff line number Diff line change 1- 1.7.2
1+ 1.8.0
You can’t perform that action at this time.
0 commit comments