-
Notifications
You must be signed in to change notification settings - Fork 296
support aclgraph #426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support aclgraph #426
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please help implement the unit test case and system test case.
1ae4054
to
695689e
Compare
8793fa4
to
edce3b8
Compare
@@ -171,6 +179,12 @@ def __init__(self, vllm_config: VllmConfig, device: torch.device): | |||
self.input_positions_cpu = torch.arange(0, | |||
self.max_num_tokens, | |||
device="cpu") | |||
self.use_cuda_graph = (self.vllm_config.compilation_config.level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to self.use_acl_graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.use_npu_graph is better
self.use_cuda_graph = (self.vllm_config.compilation_config.level | ||
== CompilationLevel.PIECEWISE | ||
and not self.model_config.enforce_eager) | ||
self.cudagraph_batch_sizes = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
from vllm.v1.sample.rejection_sampler import INVALID_TOKEN_ID, RejectionSampler | ||
else: | ||
INVALID_TOKEN_ID = None | ||
RejectionSampler = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change? HAS_TRITON is alway false in vllm-ascend. So I guess you want to rewrite vllm.v1.sample.rejection_sampler. INVALID_TOKEN_ID, RejectionSampler
in vllm-ascend here?
vllm_ascend/utils.py
Outdated
self.name = name | ||
|
||
|
||
def register_dummy_fusion_op() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to ops module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
requirements.txt
Outdated
torch >= 2.5.1 | ||
torch_npu == 2.5.1rc1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not limit torch-npu version here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e6bdffb
to
a8d3d27
Compare
28ce6ee
to
0ca634e
Compare
vllm_ascend/__init__.py
Outdated
@@ -15,6 +15,8 @@ | |||
# This file is a part of the vllm-ascend project. | |||
# | |||
|
|||
from torch_npu.contrib import transfer_to_npu # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this here? This will hide some issue and break some scenes in RL, where torch.cuda
expected to be called normally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
eca5828
to
6c0e10d
Compare
Signed-off-by: Bug Hunter Yan <yanpq@zju.edu.cn>
a7e0e28
to
b2a0b53
Compare
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> This PR supports the access of vllm-acend to the piecewise_graph feature provided by the v1 engine. 1. register unifiled_ascend_attention_with_output for piecewise_graph to split graph. 2. support NPUGraph to accelerate kernel launch. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> support npugraph to default, Users can disenable the npugraph feature by configuring enforce_eager. This has corresponding requirements for the versions of torch_npu and CANN, and they need to support graph capture. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> it turn to default --------- Signed-off-by: Bug Hunter Yan <yanpq@zju.edu.cn> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
What this PR does / why we need it?
This PR supports the access of vllm-acend to the piecewise_graph feature provided by the v1 engine.
Does this PR introduce any user-facing change?
support npugraph to default, Users can disenable the npugraph feature by configuring enforce_eager,just like.
from vllm import LLM, llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct", enforce_eager=True)
This has corresponding requirements for the versions of torch_npu and CANN, and they need to support graph capture.
How was this patch tested?
it turn to default