forked from intel/intel-npu-acceleration-library
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add int4 support * Fix dtypes * Add dtypes test * Add dtype to library * Faster i8 to i4 compression * hotfix * Update the profile-llm script * Add library * fix script * Update readme * Add neural compressor and demo * Use neural compressor as the default method * hotfix * Quantize only quantized models * Add tests * fix issue intel#27
- Loading branch information
1 parent
5294a5c
commit b34d859
Showing
22 changed files
with
422 additions
and
63 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# | ||
# Copyright © 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache 2.0 | ||
# | ||
|
||
import torch | ||
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, TextStreamer | ||
import intel_npu_acceleration_library as npu_lib | ||
import warnings | ||
|
||
torch.random.manual_seed(0) | ||
|
||
model = AutoModelForCausalLM.from_pretrained( | ||
"microsoft/Phi-3-mini-4k-instruct", | ||
torch_dtype="auto", | ||
trust_remote_code=True, | ||
) | ||
|
||
model = npu_lib.compile(model, dtype=npu_lib.int4) | ||
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct") | ||
streamer = TextStreamer(tokenizer, skip_prompt=True) | ||
|
||
messages = [ | ||
{ | ||
"role": "system", | ||
"content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user.", | ||
}, | ||
{ | ||
"role": "user", | ||
"content": "Can you provide ways to eat combinations of bananas and dragonfruits?", | ||
}, | ||
] | ||
|
||
pipe = pipeline( | ||
"text-generation", | ||
model=model, | ||
tokenizer=tokenizer, | ||
) | ||
|
||
generation_args = { | ||
"max_new_tokens": 500, | ||
"return_full_text": False, | ||
"temperature": 0.0, | ||
"do_sample": False, | ||
"streamer": streamer, | ||
} | ||
|
||
with warnings.catch_warnings(): | ||
warnings.simplefilter("ignore") | ||
pipe(messages, **generation_args) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# | ||
# Copyright © 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache 2.0 | ||
# | ||
|
||
from intel_npu_acceleration_library.backend.bindings import lib as backend_lib | ||
import numpy as np | ||
|
||
|
||
def compress_to_i4(weights: np.ndarray) -> np.ndarray: | ||
"""Compress a int8 array to int4. | ||
Args: | ||
weights (np.ndarray): input array | ||
Returns: | ||
np.ndarray: compressed array | ||
""" | ||
compressed_weights = np.zeros( | ||
(weights.shape[0], weights.shape[1] // 2), dtype=np.uint8 | ||
) | ||
|
||
backend_lib.compressToI4(weights, compressed_weights, np.prod(weights.shape)) | ||
return compressed_weights |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.