Replies: 1 comment 7 replies
-
I did not run Llama3 tho, but I traced BERT and simulated successfully. Can you check which kernel it deadlock? Did it simulate any kernel? |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using AccelSim to study the inference process of the LLaMA 3.2 1B model with PyTorch. While I can successfully obtain SASS traces, the simulation encounters a deadlock during execution.
Environment Details:
OS: Ubuntu 22.04
gcc: 11.4.0
CUDA: 11.8
NVBIT: 1.7.4
pytorch: 2.6.0+cu118
I'm wondering if it's because my CUDA version is incompatible with the NVBIT version, since NVBIT 1.7 explicitly states it requires CUDA >= 12. Or perhaps AccelSim doesn't support PyTorch applications, because I previously ran a standalone CUDA implementation of LLaMA3 inference project that worked fine with AccelSim without any deadlocks. Or could there be any other potential causes?
Beta Was this translation helpful? Give feedback.
All reactions