[LLM Runtime] Enable Mistral-7b #552

intellinjun · 2023-10-26T02:17:11Z

Type of Change

feature or bug fix or documentation or others
API changed or not:not
huggingface models:

Mistral-7B

Description

detail description
JIRA ticket: 917
TODO

n_head_kv
extension tests

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: intellinjun <jun.lin@intel.com>

intellinjun · 2023-10-26T02:28:40Z

Signed-off-by: intellinjun <105184542+intellinjun@users.noreply.github.com>

zhenwei-intel · 2023-10-26T03:13:03Z

Do we support sliding window attention (SWA) ?
https://mistral.ai/news/announcing-mistral-7b/

hshen14 · 2023-10-26T03:35:35Z

Do we support sliding window attention (SWA) ? https://mistral.ai/news/announcing-mistral-7b/

Talked with Jun, SWA is not supported yet since the code actually reuses the llama.cpp without SWA support. We can enable StreamingLLM on this model through a separate PR.

zhenwei-intel · 2023-10-26T05:53:16Z

Do we support sliding window attention (SWA) ? https://mistral.ai/news/announcing-mistral-7b/

Talked with Jun, SWA is not supported yet since the code actually reuses the llama.cpp without SWA support. We can enable StreamingLLM on this model through a separate PR.

The StreamingLLM function is already available, just specify n_ keep=4 and n_discard=-1

intellinjun · 2023-10-27T10:32:28Z

without MHA fusion, use gcc version=12, if use gcc version=13 will go to MHA fusion(mistral-7b use group query attention same as llama2-70b, it is unsupport in llama.cpp now, next week will support gqa fusion)

[LLM Runtime] Enable Mistral-7b

68bf39b

Signed-off-by: intellinjun <jun.lin@intel.com>

intellinjun requested review from zhenwei-intel and airMeng as code owners October 26, 2023 02:17

[LLM Runtime]Update the README

7363310

Signed-off-by: intellinjun <jun.lin@intel.com>

intellinjun requested a review from a32543254 October 26, 2023 02:29

Merge branch 'main' into mistral_graph

c13b636

Signed-off-by: intellinjun <105184542+intellinjun@users.noreply.github.com>

hshen14 approved these changes Oct 26, 2023

View reviewed changes

zhenwei-intel approved these changes Oct 26, 2023

View reviewed changes

hshen14 merged commit 7d14956 into main Oct 26, 2023

hshen14 deleted the mistral_graph branch October 26, 2023 04:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM Runtime] Enable Mistral-7b #552

[LLM Runtime] Enable Mistral-7b #552

intellinjun commented Oct 26, 2023 •

edited

Loading

intellinjun commented Oct 26, 2023

zhenwei-intel commented Oct 26, 2023

hshen14 commented Oct 26, 2023 •

edited

Loading

zhenwei-intel commented Oct 26, 2023

intellinjun commented Oct 27, 2023

[LLM Runtime] Enable Mistral-7b #552

[LLM Runtime] Enable Mistral-7b #552

Conversation

intellinjun commented Oct 26, 2023 • edited Loading

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

intellinjun commented Oct 26, 2023

zhenwei-intel commented Oct 26, 2023

hshen14 commented Oct 26, 2023 • edited Loading

zhenwei-intel commented Oct 26, 2023

intellinjun commented Oct 27, 2023

intellinjun commented Oct 26, 2023 •

edited

Loading

hshen14 commented Oct 26, 2023 •

edited

Loading