Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add xpu triton in dockerfile, or will show "Could not import Flash At… #2702

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

sywangyi
Copy link
Contributor

…tention enabled models: No module named 'triton'"

@Narsil

…tention enabled models: No module named 'triton'"

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
@sywangyi
Copy link
Contributor Author

2024-10-28T08:34:35.936624Z INFO text_generation_launcher: Using attention paged - Prefix caching 0
2024-10-28T08:34:35.936631Z INFO text_generation_launcher: Default max_batch_prefill_tokens to 4096
2024-10-28T08:34:35.936747Z INFO download: text_generation_launcher: Starting check and download process for mistralai/Mistral-7B-v0.1
2024-10-28T08:34:39.809841Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-10-28T08:34:40.547887Z INFO download: text_generation_launcher: Successfully downloaded weights for mistralai/Mistral-7B-v0.1
2024-10-28T08:34:40.548403Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-10-28T08:34:43.836780Z INFO text_generation_launcher: Using prefix caching = False
2024-10-28T08:34:43.836810Z INFO text_generation_launcher: Using Attention = paged
2024-10-28T08:34:43.886426Z WARN text_generation_launcher: Could not import Flash Attention enabled models: No module named 'triton'
2024-10-28T08:34:43.886914Z WARN text_generation_launcher: Could not import Mamba: No module named 'mamba_ssm'
2024-10-28T08:34:46.605594Z INFO text_generation_launcher: Using experimental prefill chunking = False
2024-10-28T08:34:46.610208Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-10-28T08:34:46.664621Z INFO shard-manager: text_generation_launcher: Shard ready in 6.107895071s rank=0
2024-10-28T08:34:46.753204Z INFO text_generation_launcher: Starting Webserver
2024-10-28T08:34:46.788056Z INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-10-28T08:34:47.027548Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/cli.py", line 116, in serve
server.serve(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 315, in serve
asyncio.run(

@sywangyi
Copy link
Contributor Author

break tgi in xpu.

@Narsil Narsil merged commit 46aeb08 into huggingface:main Oct 30, 2024
@Narsil
Copy link
Collaborator

Narsil commented Oct 30, 2024

Thanks LGTM ! Sorry for this !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants