Change hash to not require ptxas #2476

manbearian · 2023-10-10T00:00:17Z

I noticed that Triton is using the ptxas version as part of the version hash even for non-CUDA targets. This is an attempt at fixing this. Moving the version calculation to the back-end makes sense to me from an architectural standpoint, so that's my approach here. I'm not as confident in the implementation, so please if folks have any feedback let me know.

ptillet · 2023-10-13T06:33:18Z

I believe the regression is an actual one, and not just noise. Not sure where it comes from though.

manbearian · 2023-10-13T15:46:38Z

I believe the regression is an actual one, and not just noise. Not sure where it comes from though.

Yes; it does appear that way... i was sitting on this hoping other PRs might hit similar issues, but nothing has. I'll investigate when i have chance; this PR isn't exactly high priority.

manbearian · 2023-10-13T23:24:05Z

@ptillet. I looked at the failure log some more now that i know its specific to my change, and i admit, i'm even more confused. The error looks like its indicating that with my changes the A100 results got faster?!?

Looking at test_performance.py the first value is the current result and the second value is the reference result

test_performance.py:

triton.testing.assert_close(cur_gpu_util, ref_gpu_util, atol=0.02, rtol=0.01)

Results:

=========================== short test summary info ============================
FAILED test_performance.py::test_matmul[1024-1024-1024-float16] - AssertionError:  0.2981213927268982 is not close to 0.35499998927116394 (atol=0.02, rtol=0.01)
FAILED test_performance.py::test_matmul[2048-2048-2048-float16] - AssertionError:  0.5184210538864136 is not close to 0.652999997138977 (atol=0.02, rtol=0.01)
FAILED test_performance.py::test_matmul[64-4096-4096-float16] - AssertionError:  0.13460323214530945 is not close to 0.17000000178813934 (atol=0.02, rtol=0.01)
FAILED test_performance.py::test_matmul[4096-64-4096-float16] - AssertionError:  0.12149130553007126 is not close to 0.1599999964237213 (atol=0.02, rtol=0.01)
FAILED test_performance.py::test_matmul[8192-64-8192-float16] - AssertionError:  0.14076729118824005 is not close to 0.2720000147819519 (atol=0.02, rtol=0.01)

Any more thoughts? i'm going to see if i can debug to see if something really did change with the caching, but that seems so unlikely...

ThomasRaoux · 2023-10-14T00:11:18Z

@ptillet. I looked at the failure log some more now that i know its specific to my change, and i admit, i'm even more confused. The error looks like its indicating that with my changes the A100 results got faster?!?

Looking at test_performance.py the first value is the current result and the second value is the reference result

test_performance.py:
triton.testing.assert_close(cur_gpu_util, ref_gpu_util, atol=0.02, rtol=0.01)
Results:
=========================== short test summary info ============================
FAILED test_performance.py::test_matmul[1024-1024-1024-float16] - AssertionError:  0.2981213927268982 is not close to 0.35499998927116394 (atol=0.02, rtol=0.01)
FAILED test_performance.py::test_matmul[2048-2048-2048-float16] - AssertionError:  0.5184210538864136 is not close to 0.652999997138977 (atol=0.02, rtol=0.01)
FAILED test_performance.py::test_matmul[64-4096-4096-float16] - AssertionError:  0.13460323214530945 is not close to 0.17000000178813934 (atol=0.02, rtol=0.01)
FAILED test_performance.py::test_matmul[4096-64-4096-float16] - AssertionError:  0.12149130553007126 is not close to 0.1599999964237213 (atol=0.02, rtol=0.01)
FAILED test_performance.py::test_matmul[8192-64-8192-float16] - AssertionError:  0.14076729118824005 is not close to 0.2720000147819519 (atol=0.02, rtol=0.01)
Any more thoughts? i'm going to see if i can debug to see if something really did change with the caching, but that seems so unlikely...

Those numbers are utilization numbers (speed / speed of light) so higher is better which means this points to a regression. Is it possible that there is extra runtime overhead with your changes? A simple CPU profile would usually make it obvious.

manbearian · 2023-10-14T00:17:13Z

Those numbers are utilization numbers (speed / speed of light) so higher is better which means this points to a regression. Is it possible that there is extra runtime overhead with your changes? A simple CPU profile would usually make it obvious.

Okay, thanks for clearing that up. There shouldn't be any changes here for Nvidia at all, and non-impactful change for non-Nvidia targets. I plan on stepping through the compiler on Monday to figure out what i've done wrong.

ThomasRaoux · 2023-10-14T00:26:31Z

python/triton/compiler/compiler.py

-def make_hash(fn, target, env_vars, **kwargs):
+def make_hash(fn, target, env_vars, device_backend, **kwargs):
+    if device_backend is None:
+        version_key = get_cuda_version_key()


I'm wondering if the problem comes from here, this will hash the triton library everytime which is not cheap.

manbearian · 2023-10-14T01:04:29Z

Thank you for the quick look. I hadn't considered the cost of the hash. I will review and make sure my changes are neutral with respect to time in addition to correctness Sent from my Verizon, Samsung Galaxy smartphone Get Outlook for Android<https://aka.ms/AAb9ysg>

________________________________ From: Thomas Raoux ***@***.***> Sent: Friday, October 13, 2023 5:26:42 PM To: openai/triton ***@***.***> Cc: Ian Bearman ***@***.***>; Author ***@***.***> Subject: Re: [openai/triton] Change hash to not require ptxas (PR #2476) @ThomasRaoux commented on this pull request.

________________________________ In python/triton/compiler/compiler.py<#2476 (comment)>:

@@ -234,7 +234,11 @@ def convert_type_repr(x):

return x

…

-def make_hash(fn, target, env_vars, **kwargs): +def make_hash(fn, target, env_vars, device_backend, **kwargs): + if device_backend is None: + version_key = get_cuda_version_key() I'm wondering if the problem comes from here, this will hash the triton library everytime which is not cheap. — Reply to this email directly, view it on GitHub<#2476 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AF7M3YV73BD6TDOEN47FZMDX7HL4FAVCNFSM6AAAAAA5ZPRCT2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTMNZXGU3TIOBXHE>. You are receiving this because you authored the thread.Message ID: ***@***.***>

python/triton/common/backend.py

I noticed that Triton is using the `ptxas` version as part of the version hash even for non-CUDA targets. This is an attempt at fixing this. Moving the version calculation to the back-end makes sense to me from an architectural standpoint, so that's my approach here. I'm not as confident in the implementation, so please if folks have any feedback let me know.

manbearian requested a review from ptillet as a code owner October 10, 2023 00:00

manbearian requested a review from EikanWang October 10, 2023 15:37

ThomasRaoux reviewed Oct 14, 2023

View reviewed changes

manbearian force-pushed the dev/ianb/hash-change branch from b69407e to 3dacee5 Compare October 16, 2023 16:14

manbearian added 2 commits October 17, 2023 07:57

change hash to not require ptxas

f0da237

cache hash

b12aba2

manbearian force-pushed the dev/ianb/hash-change branch from 36289f6 to 5d0ef2d Compare October 17, 2023 14:57

isort + autopep8

4a3b9e6

manbearian force-pushed the dev/ianb/hash-change branch from 5d0ef2d to 4a3b9e6 Compare October 17, 2023 15:00

manbearian requested a review from ThomasRaoux October 17, 2023 15:16

ptillet requested changes Oct 17, 2023

View reviewed changes

python/triton/common/backend.py Show resolved Hide resolved

ptillet approved these changes Oct 17, 2023

View reviewed changes

ptillet merged commit 768fc1f into triton-lang:main Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change hash to not require ptxas #2476

Change hash to not require ptxas #2476

manbearian commented Oct 10, 2023

ptillet commented Oct 13, 2023

manbearian commented Oct 13, 2023

manbearian commented Oct 13, 2023 •

edited

Loading

ThomasRaoux commented Oct 14, 2023

manbearian commented Oct 14, 2023

ThomasRaoux Oct 14, 2023

manbearian commented Oct 14, 2023 via email

Change hash to not require ptxas #2476

Change hash to not require ptxas #2476

Conversation

manbearian commented Oct 10, 2023

ptillet commented Oct 13, 2023

manbearian commented Oct 13, 2023

manbearian commented Oct 13, 2023 • edited Loading

ThomasRaoux commented Oct 14, 2023

manbearian commented Oct 14, 2023

ThomasRaoux Oct 14, 2023

Choose a reason for hiding this comment

manbearian commented Oct 14, 2023 via email

manbearian commented Oct 13, 2023 •

edited

Loading