-
Notifications
You must be signed in to change notification settings - Fork 362
Description
Your current environment
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.3
Libc version: glibc-2.35
Python version: 3.10.17 (main, May 8 2025, 07:18:04) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-25-generic-aarch64-with-glibc2.35
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: HiSilicon
Model name: Kunpeng-920
Model: 0
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 4
Stepping: 0x1
Frequency boost: disabled
CPU max MHz: 2600.0000
CPU min MHz: 200.0000
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 12 MiB (192 instances)
L1i cache: 12 MiB (192 instances)
L2 cache: 96 MiB (192 instances)
L3 cache: 192 MiB (8 instances)
NUMA node(s): 8
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
NUMA node4 CPU(s): 96-119
NUMA node5 CPU(s): 120-143
NUMA node6 CPU(s): 144-167
NUMA node7 CPU(s): 168-191
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.0.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.post1.dev20250619
[pip3] torchvision==0.20.1
[pip3] transformers==4.52.4
[conda] Could not collect
vLLM Version: 0.9.2
vLLM Ascend Version: 0.9.2rc1
ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ASCEND_RT_VISIBLE_DEVICES=3
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ASCEND_RUNTIME_OPTIONS=NODRV
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
VLLM_USE_MODELSCOPE=True
PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0 Version: 24.1.0 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B4 | OK | 90.0 37 0 / 0 |
| 0 | 0000:C1:00.0 | 0 0 / 0 2862 / 32768 |
+===========================+===============+====================================================+
| 1 910B4 | OK | 86.4 35 0 / 0 |
| 0 | 0000:C2:00.0 | 0 0 / 0 2866 / 32768 |
+===========================+===============+====================================================+
| 2 910B4 | OK | 86.9 37 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 2864 / 32768 |
+===========================+===============+====================================================+
| 3 910B4 | OK | 86.4 36 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 2861 / 32768 |
+===========================+===============+====================================================+
| 4 910B4 | OK | 81.2 40 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 31942/ 32768 |
+===========================+===============+====================================================+
| 5 910B4 | OK | 85.6 40 0 / 0 |
| 0 | 0000:02:00.0 | 0 0 / 0 31945/ 32768 |
+===========================+===============+====================================================+
| 6 910B4 | OK | 86.0 41 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 32078/ 32768 |
+===========================+===============+====================================================+
| 7 910B4 | OK | 87.6 40 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 31946/ 32768 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 0 |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
| No running processes found in NPU 2 |
+===========================+===============+====================================================+
| No running processes found in NPU 3 |
+===========================+===============+====================================================+
| 4 0 | 328971 | | 29143 |
+===========================+===============+====================================================+
| 5 0 | 328977 | | 29143 |
+===========================+===============+====================================================+
| 6 0 | 328982 | | 29139 |
+===========================+===============+====================================================+
| 7 0 | 329005 | | 29143 |
+===========================+===============+====================================================+
CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux
🐛 Describe the bug
Strat Server:
vllm serve /data1/Qwen2.5-7B-Instruct --dtype bfloat16 --max_model_len 14336 --max-num-batched-tokens 14336 --port 8100
Benchmark:
ais_bench --models vllm_api_general_chat_longbenchv2 --datasets longbenchv2_gen --debug
Error Message:
[rank0]:[E806 06:36:53.094766090 compiler_depend.ts:429] SelfAttentionOperation setup failed!
Exception raised from OperationSetup at build/third_party/op-plugin/op_plugin/CMakeFiles/op_plugin_atb.dir/compiler_depend.ts:148 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0xb8 (0xffff7c20c908 in /usr/local/python3.10.17/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x6c (0xffff7c1bb404 in /usr/local/python3.10.17/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: atb::OperationSetup(atb::VariantPack, atb::Operation*, atb::Context*) + 0xc8 (0xfffdc7463b3c in /usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/lib/libop_plugin_atb.so)
frame #3: + 0x83be4 (0xfffdc7463be4 in /usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/lib/libop_plugin_atb.so)
frame #4: + 0x192b4e0 (0xfffdd435b4e0 in /usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: + 0x8115e4 (0xfffdd32415e4 in /usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #6: + 0x813814 (0xfffdd3243814 in /usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #7: + 0x810184 (0xfffdd3240184 in /usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/lib/libtorch_npu.so)
frame #8: + 0x4c9e4c (0xffff7c249e4c in /usr/local/python3.10.17/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #9: + 0x7d5b8 (0xffff86d3d5b8 in /lib/aarch64-linux-gnu/libc.so.6)
frame #10: + 0xe5edc (0xffff86da5edc in /lib/aarch64-linux-gnu/libc.so.6)
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in call
return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "<eval_with_key>.3", line 13, in forward
linear_1 = torch.C.nn.linear(getitem_1, l_self_modules_layers_modules_0_modules_mlp_modules_gate_up_proj_parameters_weight, None); getitem_1 = l_self_modules_layers_modules_0_modules_mlp_modules_gate_up_proj_parameters_weight = None
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is SelfAttentionOperation.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[ERROR] 2025-08-06-06:36:54 (PID:7320, Device:0, RankID:-1) ERR00100 PTA call acl api failed.
Call using an FX-traced Module, line 13 of the traced Module's generated forward function:
getitem_2 = npu_add_rms_norm[2]; npu_add_rms_norm = None
linear_1 = torch.C.nn.linear(getitem_1, l_self_modules_layers_modules_0_modules_mlp_modules_gate_up_proj_parameters_weight, None); getitem_1 = l_self_modules_layers_modules_0_modules_mlp_modules_gate_up_proj_parameters_weight = None
npu_swiglu = torch.ops.npu.npu_swiglu(linear_1); linear_1 = None
linear_2 = torch._C._nn.linear(npu_swiglu, l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight_, None); npu_swiglu = l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight_ = None
ERROR 08-06 06:36:54 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.9.2) with config: model='/data1/Qwen2.5-7B-Instruct', speculative_config=None, tokenizer='/data1/Qwen2.5-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=14336, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/data1/Qwen2.5-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"/root/.cache/vllm/torch_compile_cache/614f65ddcf","backend":"","custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.unified_ascend_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":"/root/.cache/vllm/torch_compile_cache/614f65ddcf/rank_0_0/backbone"},
ERROR 08-06 06:36:54 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-7c099181d3fe446f83443b1a34a0705c,prompt_token_ids_len=12732,mm_inputs=[],mm_hashes=[],mm_positions=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282],),num_computed_tokens=0,lora_request=None)], scheduled_cached_reqs=CachedRequestData(req_ids=[], resumed_from_preemption=[], new_token_ids=[], new_block_ids=[], num_computed_tokens=[]), num_scheduled_tokens={chatcmpl-7c099181d3fe446f83443b1a34a0705c: 12732}, total_num_scheduled_tokens=12732, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[100], finished_req_ids=[], free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null)
ERROR 08-06 06:36:54 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=1, num_waiting_reqs=0, kv_cache_usage=0.06975138121546964, prefix_cache_stats=PrefixCacheStats(reset=False, requests=1, queries=12732, hits=0), spec_decoding_stats=None, num_corrupted_reqs=0)
ERROR 08-06 06:36:54 [core.py:588] EngineCore encountered a fatal error.
ERROR 08-06 06:36:54 [core.py:588] Traceback (most recent call last):
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 579, in run_engine_core
ERROR 08-06 06:36:54 [core.py:588] engine_core.run_busy_loop()
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 606, in run_busy_loop
ERROR 08-06 06:36:54 [core.py:588] self._process_engine_step()
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 631, in _process_engine_step
ERROR 08-06 06:36:54 [core.py:588] outputs, model_executed = self.step_fn()
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 235, in step
ERROR 08-06 06:36:54 [core.py:588] model_output = self.execute_model(scheduler_output)
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 221, in execute_model
ERROR 08-06 06:36:54 [core.py:588] raise err
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 212, in execute_model
ERROR 08-06 06:36:54 [core.py:588] return self.model_executor.execute_model(scheduler_output)
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 87, in execute_model
ERROR 08-06 06:36:54 [core.py:588] output = self.collective_rpc("execute_model",
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 08-06 06:36:54 [core.py:588] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 2736, in run_method
ERROR 08-06 06:36:54 [core.py:588] return func(*args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
ERROR 08-06 06:36:54 [core.py:588] output = self.model_runner.execute_model(scheduler_output,
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 08-06 06:36:54 [core.py:588] return func(*args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1403, in execute_model
ERROR 08-06 06:36:54 [core.py:588] num_scheduled_tokens_np) = (self._process_reqs(
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1129, in _process_reqs
ERROR 08-06 06:36:54 [core.py:588] hidden_states = self.model(
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 08-06 06:36:54 [core.py:588] return self._call_impl(*args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 08-06 06:36:54 [core.py:588] return forward_call(*args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 478, in forward
ERROR 08-06 06:36:54 [core.py:588] hidden_states = self.model(input_ids, positions, intermediate_tensors,
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
ERROR 08-06 06:36:54 [core.py:588] model_output = self.forward(*args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
ERROR 08-06 06:36:54 [core.py:588] def forward(
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 08-06 06:36:54 [core.py:588] return self._call_impl(*args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 08-06 06:36:54 [core.py:588] return forward_call(*args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
ERROR 08-06 06:36:54 [core.py:588] return fn(*args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
ERROR 08-06 06:36:54 [core.py:588] return self._wrapped_call(self, *args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
ERROR 08-06 06:36:54 [core.py:588] raise e
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
ERROR 08-06 06:36:54 [core.py:588] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 08-06 06:36:54 [core.py:588] return self._call_impl(*args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 08-06 06:36:54 [core.py:588] return forward_call(*args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "<eval_with_key>.58", line 213, in forward
ERROR 08-06 06:36:54 [core.py:588] submod_2 = self.submod_2(getitem_3, s0, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_, getitem_4, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_up_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_bias_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_ = getitem_4 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_up_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_bias_ = None
ERROR 08-06 06:36:54 [core.py:588] File "/vllm-workspace/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 128, in __call__
ERROR 08-06 06:36:54 [core.py:588] return self.compiled_graph_for_general_shape(*args)
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
ERROR 08-06 06:36:54 [core.py:588] return self._wrapped_call(self, *args, **kwargs)
ERROR 08-06 06:36:54 [core.py:588] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 359, in __call__
ERROR 08-06 06:36:54 [core.py:588] raise e.with_traceback(None) # noqa: B904
ERROR 08-06 06:36:54 [core.py:588] RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is SelfAttentionOperation.
ERROR 08-06 06:36:54 [core.py:588] Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
ERROR 08-06 06:36:54 [core.py:588] Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
ERROR 08-06 06:36:54 [core.py:588] [ERROR] 2025-08-06-06:36:54 (PID:7320, Device:0, RankID:-1) ERR00100 PTA call acl api failed.
ERROR 08-06 06:36:54 [core.py:588]
ERROR 08-06 06:36:54 [async_llm.py:419] AsyncLLM output_handler failed.
ERROR 08-06 06:36:54 [async_llm.py:419] Traceback (most recent call last):
ERROR 08-06 06:36:54 [async_llm.py:419] File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 378, in output_handler
ERROR 08-06 06:36:54 [async_llm.py:419] outputs = await engine_core.get_output_async()
ERROR 08-06 06:36:54 [async_llm.py:419] File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 740, in get_output_async
ERROR 08-06 06:36:54 [async_llm.py:419] raise self._format_exception(outputs) from None
ERROR 08-06 06:36:54 [async_llm.py:419] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 08-06 06:36:54 [async_llm.py:345] Request chatcmpl-7c099181d3fe446f83443b1a34a0705c failed (engine dead).