Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature(MInference): support llama 3.1 #54

Merged
merged 1 commit into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ https://github.com/microsoft/MInference/assets/30883354/52613efc-738f-4081-8367-
_Now, you can process **1M context 10x faster in a single A100** using Long-context LLMs like LLaMA-3-8B-1M, GLM-4-1M, with even **better accuracy**, try **MInference 1.0** right now!_

## News
- 🥤 [24/07/24] MInference support [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) now.
- 🪗 [24/07/07] Thanks @AK for sponsoring. You can now use MInference online in the [HF Demo](https://huggingface.co/spaces/microsoft/MInference) with ZeroGPU.
- 📃 [24/07/03] Due to an issue with arXiv, the PDF is currently unavailable there. You can find the paper at this [link](https://export.arxiv.org/pdf/2407.02490).
- 🧩 [24/07/03] We will present **MInference 1.0** at the _**Microsoft Booth**_ and _**ES-FoMo**_ at ICML'24. See you in Vienna!
Expand Down Expand Up @@ -60,6 +61,7 @@ get_support_models()
```

Currently, we support the following LLMs:
- LLaMA-3.1: [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
- LLaMA-3: [gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k), [gradientai/Llama-3-8B-Instruct-Gradient-1048k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k), [gradientai/Llama-3-8B-Instruct-Gradient-4194k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-4194k)
- GLM-4: [THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)
- Yi: [01-ai/Yi-9B-200K](https://huggingface.co/01-ai/Yi-9B-200K)
Expand Down

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions minference/configs/model2path.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@
"THUDM/glm-4-9b-chat-1m": os.path.join(
BASE_DIR, "GLM_4_9B_1M_instruct_kv_out_v32_fit_o_best_pattern.json"
),
"meta-llama/Meta-Llama-3.1-8B-Instruct": os.path.join(
BASE_DIR, "Llama_3.1_8B_Instruct_128k_kv_out_v32_fit_o_best_pattern.json"
),
}


Expand Down
3 changes: 2 additions & 1 deletion minference/modules/minference_forward.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from importlib import import_module

from transformers.models.llama.modeling_llama import *
from transformers.utils import is_flash_attn_2_available
from transformers.utils.import_utils import _is_package_available

if _is_package_available("vllm"):
Expand Down Expand Up @@ -531,7 +532,7 @@ def forward(
if os.path.exists(self.config_path):
config_list = json.load(open(self.config_path))
if self.layer_idx < len(config_list):
assert False
assert False, f"Search completed. The config is located in {self.config_path}."
else:
config_list = []
config = {}
Expand Down
4 changes: 2 additions & 2 deletions minference/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
_MINOR = "1"
# On master and in a nightly release the patch should be one ahead of the last
# released build.
_PATCH = "4"
_PATCH = "5"
# This is mainly for nightly builds which have the suffix ".dev$DATE". See
# https://semver.org/#is-v123-a-semantic-version for the semantics.
_SUFFIX = ".post4"
_SUFFIX = ""

VERSION_SHORT = "{0}.{1}".format(_MAJOR, _MINOR)
VERSION = "{0}.{1}.{2}{3}".format(_MAJOR, _MINOR, _PATCH, _SUFFIX)