Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HeteroLinear/SEGMM: switch from heuristic to timing-cache #8615

Merged
merged 24 commits into from
Jan 29, 2024

Conversation

puririshi98
Copy link
Contributor

@puririshi98 puririshi98 commented Dec 13, 2023

needed to rebase #8472

@puririshi98 puririshi98 requested a review from wsad1 as a code owner December 13, 2023 21:48
Copy link
Member

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this PR do on a high level?

@puririshi98
Copy link
Contributor Author

puririshi98 commented Jan 5, 2024

What does this PR do on a high level?

instead of using my sklearn hueristics which were trained on ampere A100 (so probably not as accurate on other very diff cards), @stadlmax sets up a timer that figures out which is better and uses that. imo this makes more sense to work on all hardware. he will work on resolving the conflicts when he has bandwidth, i spoke to him about it yesterday

@stadlmax stadlmax changed the title rebasing stadlmax's timing heuristic for segmm HeteroLinear/SEGMM: switch from heuristic to timing-cache Jan 5, 2024
Copy link

codecov bot commented Jan 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3773b53) 89.38% compared to head (1c10dda) 89.40%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8615      +/-   ##
==========================================
+ Coverage   89.38%   89.40%   +0.02%     
==========================================
  Files         479      479              
  Lines       31152    31165      +13     
==========================================
+ Hits        27845    27864      +19     
+ Misses       3307     3301       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is this something similar to https://pytorch.org/docs/stable/backends.html#torch.backends.cudnn.benchmark in a sense that it first benchmarks all algos and then select the fastest one? Shall we make this as an opt-in feature just like cudnn.benchmark?
  • I'm not sure how unlikely it is, but is the measurament robust enough for each rank to always pick up the same algorithm in a distributed setting?

puririshi98 and others added 5 commits January 8, 2024 08:56
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
@puririshi98
Copy link
Contributor Author

Is this something similar to https://pytorch.org/docs/stable/backends.html#torch.backends.cudnn.benchmark in a sense that it first benchmarks all algos and then select the fastest one? Shall we make this as an opt-in feature just like cudnn.benchmark?

i think in our case there are only two algorithms to check and in many cases the speed difference is drastic(in either direction). we are currently selecting it based on sklearn hueristics i trained from benchmarks on a single A100 machine on a single software stack from about half a year ago. this is far less robust on the majority of cards besides A100. i think the new solution is much better. I think instead of making the hueristic opt-in seperately, we can make it a part of the existing `torch_geometric.backend.use_segment_matmul

I'm not sure how unlikely it is, but is the measurament robust enough for each rank to always pick up the same algorithm in a distributed setting?

i think this is a very low likelihood and we run a similarly low risk with the current heuristic when using dynamic shapes on each rank

@puririshi98
Copy link
Contributor Author

@rusty1s any thoughts on the above discussion?

@rusty1s
Copy link
Member

rusty1s commented Jan 10, 2024

Yes, I agree that this solution is much better than the heuristic-based one. I think we would need to avoid measuring warmup times though, and likely adjust MEASURE_ITER based on whether we are in a test or in an actual example.

@puririshi98
Copy link
Contributor Author

puririshi98 commented Jan 10, 2024

Yes, I agree that this solution is much better than the heuristic-based one. I think we would need to avoid measuring warmup times though, and likely adjust MEASURE_ITER based on whether we are in a test or in an actual example.

that sounds like a good idea to me. do you have a suggested method of implementing such a system?

@rusty1s
Copy link
Member

rusty1s commented Jan 11, 2024

Currently, we use 'pytest' in sys.modules to detect whether we are testing.

@puririshi98
Copy link
Contributor Author

okay thanks @rusty1s, do you have a desired value for MEASURE_ITER when we are testing? how about for when we aren't

@rusty1s
Copy link
Member

rusty1s commented Jan 17, 2024

I think for testing MEASURE_ITER=1 would be sufficient. The best value for scripts needs to be recommended by you :)

Copy link
Member

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (but I'd suggest waiting for another review from Matthias)

Also, I think we can follow up in a separate PR, but we should update RGCNConv right?

@puririshi98
Copy link
Contributor Author

but we should update RGCNConv right?
i agree, i can add this to my todo list and will follownup w/ a future PR

@puririshi98
Copy link
Contributor Author

@rusty1s anything else needed to merge?

@rusty1s
Copy link
Member

rusty1s commented Jan 23, 2024

Please let me take a look tomorrow :) I'll try to get to it. If you are feeling blocked, please go ahead and merge.

Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice :)

@rusty1s rusty1s merged commit 51c57da into master Jan 29, 2024
14 checks passed
@rusty1s rusty1s deleted the rebase-time-heuristic branch January 29, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants