microsoft/phi-2 model outputs nonsense on NPU #27

flomader · 2024-05-10T12:42:30Z

Describe the bug
After I compile the microsoft/phi-2 model with intel_npu_acceleration_library the output of the model is complete nonsense. It just outputs text like to- or in of ", as for, on, and, is,, and, are,., and,,,,, and,,,, and,,, and,,,, and,,, and,, and,,, and,,

To Reproduce
Steps to reproduce the behavior:

from` langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import intel_npu_acceleration_library
import torch

model_id = "microsoft/Phi-2"

model = AutoModelForCausalLM.from_pretrained(model_id, use_cache=True).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_default_system_prompt=True)

npu_model = intel_npu_acceleration_library.compile(model, dtype=torch.int8)

pipe = pipeline(
    "text-generation",
    model=npu_model,
    tokenizer=tokenizer,
    max_length=256,
    temperature=0.9,
    top_p=0.95,
    repetition_penalty=1.2
)

local_llm = HuggingFacePipeline(pipeline=pipe)
pipe.model.config.pad_token_id = pipe.model.config.eos_token_id


template = """Question: {question}

Answer: """

prompt = PromptTemplate(template=template, input_variables=["question"])

llm_chain = LLMChain(
    prompt=prompt,
    llm=local_llm
)

question = "What's the distance between the Earth and the Moon?"

print(llm_chain.run(question))

The output is:
Question: What's the distance between the Earth and the Moon?

Answer: to- or in of ", as for, on, and, is,, and, are,., and,,,,, and,,,, and,,, and,,,, and,,, and,, and,,, and,,, and,,, and,, and,, and,, and,, a....

Expected behavior
When running the initial model (the one compiled for CPU) the output is:
_Question: What's the distance between the Earth and the Moon?

Answer: The average distance from the Earth to the moon is about 238,855 miles._

Desktop (please complete the following information):

OS: Windows 11

The text was updated successfully, but these errors were encountered:

alessandropalla · 2024-05-10T12:45:48Z

what driver version are you using?

alessandropalla · 2024-05-10T12:55:21Z

Anyway I can reproduce it myself. It goes away if you use fp16 inference

npu_model = intel_npu_acceleration_library.compile(model, dtype=torch.float16)

I suggest you to use that while I try to understand what is happening in our quantization scheme that is causing such unacceptable accuracy drop. Many thanks for raising this issue I'll keep you updated on this ticket

flomader · 2024-05-10T13:02:00Z

My Intel(R) AI Boost driver is on 32.0.100.2381. Thanks!

alessandropalla · 2024-05-10T13:14:48Z

No problem. I'd like the int8/int4 version to work especially for this PR (#20) as it will bring significative performance boost. Here it seems like an issue in our quantization step. I'll keep you posted

alessandropalla · 2024-05-28T14:51:21Z

Fix in flight in #32

alessandropalla · 2024-05-29T13:10:56Z

Commit ae0a999 fixes your issue for phi-2

* Add int4 support * Fix dtypes * Add dtypes test * Add dtype to library * Faster i8 to i4 compression * hotfix * Update the profile-llm script * Add library * fix script * Update readme * Add neural compressor and demo * Use neural compressor as the default method * hotfix * Quantize only quantized models * Add tests * fix issue #27

alessandropalla added the bug Something isn't working label May 10, 2024

alessandropalla self-assigned this May 10, 2024

alessandropalla added a commit that referenced this issue May 29, 2024

fix issue #27

ae0a999

alessandropalla closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

microsoft/phi-2 model outputs nonsense on NPU #27

microsoft/phi-2 model outputs nonsense on NPU #27

flomader commented May 10, 2024 •

edited

Loading

alessandropalla commented May 10, 2024

alessandropalla commented May 10, 2024

flomader commented May 10, 2024

alessandropalla commented May 10, 2024

alessandropalla commented May 28, 2024

alessandropalla commented May 29, 2024

microsoft/phi-2 model outputs nonsense on NPU #27

microsoft/phi-2 model outputs nonsense on NPU #27

Comments

flomader commented May 10, 2024 • edited Loading

alessandropalla commented May 10, 2024

alessandropalla commented May 10, 2024

flomader commented May 10, 2024

alessandropalla commented May 10, 2024

alessandropalla commented May 28, 2024

alessandropalla commented May 29, 2024

flomader commented May 10, 2024 •

edited

Loading