Deadlock Issue During LLaMA 3.2 1B Model Inference with accelsim #410

inkmancc · 2025-04-02T12:20:47Z

inkmancc
Apr 2, 2025

I'm using AccelSim to study the inference process of the LLaMA 3.2 1B model with PyTorch. While I can successfully obtain SASS traces, the simulation encounters a deadlock during execution.

Environment Details:
OS: Ubuntu 22.04
gcc: 11.4.0
CUDA: 11.8
NVBIT: 1.7.4
pytorch: 2.6.0+cu118

I'm wondering if it's because my CUDA version is incompatible with the NVBIT version, since NVBIT 1.7 explicitly states it requires CUDA >= 12. Or perhaps AccelSim doesn't support PyTorch applications, because I previously ran a standalone CUDA implementation of LLaMA3 inference project that worked fine with AccelSim without any deadlocks. Or could there be any other potential causes?

JRPan · 2025-04-02T21:54:57Z

JRPan
Apr 2, 2025
Maintainer

Accel-SIM supports Pytorch
If the traces are generated correctly, there should be no problem with NVBit.

I did not run Llama3 tho, but I traced BERT and simulated successfully.

Can you check which kernel it deadlock? Did it simulate any kernel?
What config is used? I wonder if this is a config issue.
You can turn on trace mode in the gpgpusim.config and see what is the last issued inst. Then figure out why the one after is not issuing.

7 replies

inkmancc Apr 3, 2025
Author

Yes! The kernel indeed exhibits the same issue as described in #306. If this is confirmed to be an NVBIT-related problem, does it mean the kernel is entirely unusable, or can it be recovered through some method?

JRPan Apr 3, 2025
Maintainer

You can. There is another issue on this explains how to modify the traces. #360 (comment)

But I would suggest completely discard this kernel. It's small and has minimal impact.

Also it's a little bit interesting that you are deadlocking. Other issues reported are having assertion fails.

inkmancc Apr 3, 2025
Author

I see，thank you！

RohithRajesh Apr 20, 2025

Hello,

I'm seeing a similar issue while profiling a Llama based small LM. And this is happening quite frequently (in only reduction kernels). I've written a script that checked for this condition within a warp and makes the mask such that atleast one thread remains. Is this a valid fix, worth a PR?

JRPan Apr 20, 2025
Maintainer

Glad you found a workaround! And thank you for the suggestion.

We are fixing this in the model to allow instructions after exit. Should be merged into mainline really soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deadlock Issue During LLaMA 3.2 1B Model Inference with accelsim #410

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Deadlock Issue During LLaMA 3.2 1B Model Inference with accelsim #410

Uh oh!

inkmancc Apr 2, 2025

Replies: 1 comment · 7 replies

Uh oh!

JRPan Apr 2, 2025 Maintainer

Uh oh!

inkmancc Apr 3, 2025 Author

Uh oh!

JRPan Apr 3, 2025 Maintainer

Uh oh!

inkmancc Apr 3, 2025 Author

Uh oh!

RohithRajesh Apr 20, 2025

Uh oh!

JRPan Apr 20, 2025 Maintainer

inkmancc
Apr 2, 2025

Replies: 1 comment 7 replies

JRPan
Apr 2, 2025
Maintainer

inkmancc Apr 3, 2025
Author

JRPan Apr 3, 2025
Maintainer

inkmancc Apr 3, 2025
Author

JRPan Apr 20, 2025
Maintainer