Skip to content

Refactor JIT compilation (+NVRTC support) #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
May 7, 2025
Merged

Refactor JIT compilation (+NVRTC support) #94

merged 25 commits into from
May 7, 2025

Conversation

lucifer1004
Copy link
Collaborator

With this PR, the JIT compilation time is reduced by ~60% when using NVCC, ~80% when using NVRTC w/o PCH, and ~90% when using NVRTC w/ PCH.

Benchmark on Xeon(R) Platinum 8480C + H100 HBM:

m n k compilation time (s) TFLOPS
baseline NVCC NVRTC NVRTC+PCH baseline NVCC NVRTC NVRTC+PCH
64 2112 7168 5.95 2.12 0.96 0.35 154 154 153 153
64 24576 1536 6.43 2.55 1.37 0.78 230 230 231 230
64 32768 512 6.06 2.20 0.99 0.40 180 180 180 180
64 7168 16384 6.11 2.25 1.01 0.43 289 290 289 289
64 4096 7168 5.98 2.15 0.97 0.37 207 207 206 206
64 7168 2048 6.03 2.19 1.01 0.42 188 189 187 188
128 2112 7168 5.98 2.18 0.96 0.37 283 280 281 282
128 24576 1536 6.38 2.56 1.39 0.78 440 435 434 434
128 32768 512 6.00 2.18 0.99 0.40 334 336 330 332
128 7168 16384 6.24 2.40 1.18 0.59 534 534 534 533
128 4096 7168 6.04 2.21 1.03 0.43 370 371 370 371
128 7168 2048 6.19 2.38 1.19 0.62 332 334 331 332
4096 2112 7168 6.09 2.30 1.08 0.49 1053 1058 1113 1114
4096 24576 1536 6.14 2.33 1.10 0.50 1288 1286 1290 1289
4096 32768 512 6.23 2.41 1.21 0.61 913 912 912 911
4096 7168 16384 6.31 2.51 1.27 0.68 1524 1524 1459 1458
4096 4096 7168 6.28 2.47 1.27 0.68 1446 1445 1394 1396
4096 7168 2048 6.18 2.40 1.21 0.61 1240 1243 1234 1237

Note that there is some perf drop when using NVRTC due to a known bug of NVRTC which leads to extra instructions (but in the m=4096,n=2112,k=7168 case, NVRTC version was faster, which was a bit strange). So NVCC is kept as the default compiler for now, while NVRTC can be enabled with extra env var DG_JIT_USE_NVRTC.

Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Signed-off-by: Gabriel Wu <13583761+lucifer1004@users.noreply.github.com>
feat: add compat for older drivers and Windows
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
@LyricZhao LyricZhao requested review from LyricZhao and zheanxu May 7, 2025 03:18
@LyricZhao LyricZhao merged commit bfe983c into main May 7, 2025
@LyricZhao LyricZhao deleted the nvrtc branch May 7, 2025 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants