We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent c8ba661 commit 38658b1Copy full SHA for 38658b1
CHANGELOG.md
@@ -1,4 +1,14 @@
1
# CHANGELOG
2
+# [Version v1.7.0](https://github.com/intel/xFasterTransformer/releases/tag/v1.7.0)
3
+v1.7.1 - Continuous batching feature supports ChatGLM2/3.
4
+
5
+## Functionality
6
+- Add continuous batching support of ChatGLM2/3 models.
7
+- Qwen2Convert supports quantized Qwen2 models by GPTQ, such as GPTQ-Int8 and GPTQ-Int4, by param `from_quantized_model="gptq"`.
8
9
+## BUG fix
10
+- Fixed the segament fault error when running with more than 2 ranks in vllm-xft serving.
11
12
# [Version v1.7.0](https://github.com/intel/xFasterTransformer/releases/tag/v1.7.0)
13
v1.7.0 - Continuous batching feature supported.
14
VERSION
@@ -1 +1 @@
-1.7.0
+1.7.1
0 commit comments