Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError on Model Inference Using intel_npu_acceleration_library #146

Open
vaikunth-coder27 opened this issue Dec 14, 2024 · 3 comments
Open

Comments

@vaikunth-coder27
Copy link

Description

When running a script utilizing intel_npu_acceleration_library for causal language model inference, an OSError is encountered during the model.generate() step. The error appears to be related to NPU compilation.

Code Snippet

# Copyright © 2024 Intel Corporation
# SPDX-License-Identifier: Apache 2.0

import intel_npu_acceleration_library
from transformers import AutoTokenizer, TextStreamer
from intel_npu_acceleration_library import NPUModelForCausalLM, int4
from intel_npu_acceleration_library.compiler import CompilerConfig

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

compiler_conf = CompilerConfig(dtype=int4)
model = NPUModelForCausalLM.from_pretrained(
    model_id, use_cache=True, config=compiler_conf, attn_implementation="sdpa"
).eval()

tokenizer = AutoTokenizer.from_pretrained(model_id, use_default_system_prompt=True)
tokenizer.pad_token_id = tokenizer.eos_token_id

streamer = TextStreamer(tokenizer, skip_special_tokens=True)
query = input("Ask something: ")
prefix = tokenizer(query, return_tensors="pt")["input_ids"]
generation_kwargs = dict(
    input_ids=prefix,
    streamer=streamer,
    do_sample=True,
    top_k=50,
    top_p=0.9,
    max_new_tokens=512,
)
print('test 6')
print("Run inference")
_ = model.generate(**generation_kwargs)

Observed Behavior

The following error is encountered during the execution:

Run inference
how are 
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
... [truncated for brevity] ...
OSError: [WinError -529697949] Windows Error 0xe06d7363

Expected Behavior

The model should generate text without encountering an error during inference.

System Details

  • Operating System: Windows 11 24H2
  • Processor: Intel(R) Core(TM) Ultra 9 185H x64
  • NPU Driver: 32.0.100.2540
  • Installed Packages:
    • torch==2.5.1+cu124
    • intel_npu_acceleration_library==v1.4.0

Additional Context

The error traceback suggests the issue occurs during a call to intel_npu_acceleration_library.backend.factory.compile().

Steps to Reproduce

  1. Install the required packages:
    • torch==2.5.1+cu124
    • intel_npu_acceleration_library==v1.4.0
  2. Set up the environment with:
    • Windows 11 24H2
    • NPU Driver: 32.0.100.2540
  3. Run the provided code snippet.

Please provide guidance or fixes for resolving this issue.

@alessandropalla
Copy link
Contributor

Try to update the driver to the latest version

@Prajwal-Prathiksh
Copy link

@vaikunth-coder27 I'm facing the same issue on my PC.
Were you able to fix the issue by updating the driver or through some other way?

@vaikunth-coder27
Copy link
Author

updating the driver to the current version (32.0.100.3104) solved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants