HeteroLinear/SEGMM: switch from heuristic to timing-cache #8615

puririshi98 · 2023-12-13T21:47:35Z

needed to rebase #8472

akihironitta

What does this PR do on a high level?

puririshi98 · 2024-01-05T15:00:11Z

What does this PR do on a high level?

instead of using my sklearn hueristics which were trained on ampere A100 (so probably not as accurate on other very diff cards), @stadlmax sets up a timer that figures out which is better and uses that. imo this makes more sense to work on all hardware. he will work on resolving the conflicts when he has bandwidth, i spoke to him about it yesterday

codecov · 2024-01-05T16:42:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3773b53) 89.38% compared to head (1c10dda) 89.40%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8615      +/-   ##
==========================================
+ Coverage   89.38%   89.40%   +0.02%     
==========================================
  Files         479      479              
  Lines       31152    31165      +13     
==========================================
+ Hits        27845    27864      +19     
+ Misses       3307     3301       -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

akihironitta

Is this something similar to https://pytorch.org/docs/stable/backends.html#torch.backends.cudnn.benchmark in a sense that it first benchmarks all algos and then select the fastest one? Shall we make this as an opt-in feature just like cudnn.benchmark?
I'm not sure how unlikely it is, but is the measurament robust enough for each rank to always pick up the same algorithm in a distributed setting?

torch_geometric/nn/dense/linear.py

CHANGELOG.md

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

for more information, see https://pre-commit.ci

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

puririshi98 · 2024-01-09T16:41:46Z

Is this something similar to https://pytorch.org/docs/stable/backends.html#torch.backends.cudnn.benchmark in a sense that it first benchmarks all algos and then select the fastest one? Shall we make this as an opt-in feature just like cudnn.benchmark?

i think in our case there are only two algorithms to check and in many cases the speed difference is drastic(in either direction). we are currently selecting it based on sklearn hueristics i trained from benchmarks on a single A100 machine on a single software stack from about half a year ago. this is far less robust on the majority of cards besides A100. i think the new solution is much better. I think instead of making the hueristic opt-in seperately, we can make it a part of the existing `torch_geometric.backend.use_segment_matmul

I'm not sure how unlikely it is, but is the measurament robust enough for each rank to always pick up the same algorithm in a distributed setting?

i think this is a very low likelihood and we run a similarly low risk with the current heuristic when using dynamic shapes on each rank

puririshi98 · 2024-01-09T20:00:54Z

@rusty1s any thoughts on the above discussion?

rusty1s · 2024-01-10T14:50:21Z

Yes, I agree that this solution is much better than the heuristic-based one. I think we would need to avoid measuring warmup times though, and likely adjust MEASURE_ITER based on whether we are in a test or in an actual example.

puririshi98 · 2024-01-10T17:10:11Z

Yes, I agree that this solution is much better than the heuristic-based one. I think we would need to avoid measuring warmup times though, and likely adjust MEASURE_ITER based on whether we are in a test or in an actual example.

that sounds like a good idea to me. do you have a suggested method of implementing such a system?

rusty1s · 2024-01-11T08:07:48Z

Currently, we use 'pytest' in sys.modules to detect whether we are testing.

puririshi98 · 2024-01-16T16:41:41Z

okay thanks @rusty1s, do you have a desired value for MEASURE_ITER when we are testing? how about for when we aren't

rusty1s · 2024-01-17T13:27:32Z

I think for testing MEASURE_ITER=1 would be sufficient. The best value for scripts needs to be recommended by you :)

for more information, see https://pre-commit.ci

akihironitta

LGTM! (but I'd suggest waiting for another review from Matthias)

Also, I think we can follow up in a separate PR, but we should update RGCNConv right?

puririshi98 · 2024-01-18T17:19:28Z

but we should update RGCNConv right?
i agree, i can add this to my todo list and will follownup w/ a future PR

puririshi98 · 2024-01-23T17:49:05Z

@rusty1s anything else needed to merge?

rusty1s · 2024-01-23T19:28:40Z

Please let me take a look tomorrow :) I'll try to get to it. If you are feeling blocked, please go ahead and merge.

rusty1s

That's nice :)

rebasing stadlmax's timing heuristic for segmm

47209ba

puririshi98 requested review from rusty1s, akihironitta and stadlmax December 13, 2023 21:47

puririshi98 self-assigned this Dec 13, 2023

puririshi98 requested a review from EdisonLeeeee as a code owner December 13, 2023 21:47

github-actions bot added the nn label Dec 13, 2023

finishing rebase

fe4dc26

puririshi98 requested a review from wsad1 as a code owner December 13, 2023 21:48

Merge branch 'master' into rebase-time-heuristic

e352aa9

akihironitta reviewed Jan 5, 2024

View reviewed changes

resolve conflicts

6814647

stadlmax changed the title ~~rebasing stadlmax's timing heuristic for segmm~~ HeteroLinear/SEGMM: switch from heuristic to timing-cache Jan 5, 2024

stadlmax added 2 commits January 5, 2024 16:09

fix issues

edfb5ec

fix further issues

586e3e3

akihironitta reviewed Jan 8, 2024

View reviewed changes

torch_geometric/nn/dense/linear.py Outdated Show resolved Hide resolved

torch_geometric/nn/dense/linear.py Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

puririshi98 and others added 5 commits January 8, 2024 08:56

Merge branch 'master' into rebase-time-heuristic

a29943b

accept review suggestions

32f64ea

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

global measure iters

6e60787

[pre-commit.ci] auto fixes from pre-commit.com hooks

309dc53

for more information, see https://pre-commit.ci

applying suggestion

33d123d

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

Merge branch 'master' into rebase-time-heuristic

03abe6d

Merge branch 'master' into rebase-time-heuristic

f795f54

Merge branch 'master' into rebase-time-heuristic

e768f70

puririshi98 and others added 2 commits January 17, 2024 10:38

MEASURE_ITER=1 for pytesting

03c526f

[pre-commit.ci] auto fixes from pre-commit.com hooks

344947e

for more information, see https://pre-commit.ci

akihironitta approved these changes Jan 18, 2024

View reviewed changes

puririshi98 added 3 commits January 18, 2024 13:06

Merge branch 'master' into rebase-time-heuristic

3dccb19

Merge branch 'master' into rebase-time-heuristic

39ea9dd

Merge branch 'master' into rebase-time-heuristic

28bda99

Merge branch 'master' into rebase-time-heuristic

f1d3c7f

puririshi98 and others added 4 commits January 24, 2024 10:41

Merge branch 'master' into rebase-time-heuristic

191d239

Merge branch 'master' into rebase-time-heuristic

c91f0f5

Merge branch 'master' into rebase-time-heuristic

054e3f1

update

1c10dda

rusty1s approved these changes Jan 29, 2024

View reviewed changes

rusty1s merged commit 51c57da into master Jan 29, 2024
14 checks passed

rusty1s deleted the rebase-time-heuristic branch January 29, 2024 12:19

akihironitta mentioned this pull request Feb 9, 2024

Add support for torch.compile in RGCNConv #8783

Closed

akihironitta mentioned this pull request Feb 16, 2024

Decide use_segment_matmul based on benchmark instead of fixed heuristics in RGCNConv #8929

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HeteroLinear/SEGMM: switch from heuristic to timing-cache #8615

HeteroLinear/SEGMM: switch from heuristic to timing-cache #8615

puririshi98 commented Dec 13, 2023 •

edited

Loading

akihironitta left a comment

puririshi98 commented Jan 5, 2024 •

edited

Loading

codecov bot commented Jan 5, 2024 •

edited

Loading

akihironitta left a comment

puririshi98 commented Jan 9, 2024

puririshi98 commented Jan 9, 2024

rusty1s commented Jan 10, 2024

puririshi98 commented Jan 10, 2024 •

edited

Loading

rusty1s commented Jan 11, 2024

puririshi98 commented Jan 16, 2024

rusty1s commented Jan 17, 2024

akihironitta left a comment

puririshi98 commented Jan 18, 2024

puririshi98 commented Jan 23, 2024

rusty1s commented Jan 23, 2024

rusty1s left a comment

HeteroLinear/SEGMM: switch from heuristic to timing-cache #8615

HeteroLinear/SEGMM: switch from heuristic to timing-cache #8615

Conversation

puririshi98 commented Dec 13, 2023 • edited Loading

akihironitta left a comment

Choose a reason for hiding this comment

puririshi98 commented Jan 5, 2024 • edited Loading

codecov bot commented Jan 5, 2024 • edited Loading

Codecov Report

akihironitta left a comment

Choose a reason for hiding this comment

puririshi98 commented Jan 9, 2024

puririshi98 commented Jan 9, 2024

rusty1s commented Jan 10, 2024

puririshi98 commented Jan 10, 2024 • edited Loading

rusty1s commented Jan 11, 2024

puririshi98 commented Jan 16, 2024

rusty1s commented Jan 17, 2024

akihironitta left a comment

Choose a reason for hiding this comment

puririshi98 commented Jan 18, 2024

puririshi98 commented Jan 23, 2024

rusty1s commented Jan 23, 2024

rusty1s left a comment

Choose a reason for hiding this comment

puririshi98 commented Dec 13, 2023 •

edited

Loading

puririshi98 commented Jan 5, 2024 •

edited

Loading

codecov bot commented Jan 5, 2024 •

edited

Loading

puririshi98 commented Jan 10, 2024 •

edited

Loading