Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating docs to include matrix-vector multiply example #918

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

akashkgarg
Copy link

As I work through how to speed-up some of functionality in the SciML codebases using multiple GPUs, I thought I'd add my small experiments as examples for other users of this package. Comments/feedback is welcome if the example(s) shown could be done better.

@codecov
Copy link

codecov bot commented May 20, 2021

Codecov Report

Merging #918 (03192b6) into master (eb7c326) will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #918      +/-   ##
==========================================
- Coverage   77.00%   76.99%   -0.01%     
==========================================
  Files         121      121              
  Lines        7706     7708       +2     
==========================================
+ Hits         5934     5935       +1     
- Misses       1772     1773       +1     
Impacted Files Coverage Δ
lib/cusolver/CUSOLVER.jl 82.00% <0.00%> (-1.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb7c326...03192b6. Read the comment docs.

@maleadt
Copy link
Member

maleadt commented May 21, 2021

Nice example! Any idea why the minimum times show a much more pronounced speed-up? It could be because CUDA.@sync does a synchronize(), so only synchronizes the current task. Maybe it should call device_synchronize() instead, but that's fairly costly.

@kshyatt kshyatt added the documentation Improvements or additions to documentation label May 24, 2021
@akashkgarg
Copy link
Author

@maleadt great question. I added device_synchronize() and it does reduce the variance quite a bit. The updated is probably a more reasonable implementation/benchmark.

@akashkgarg
Copy link
Author

@maleadt I added another example that does a reduction over a large array. Surprisingly, the multiple GPU case is significantly slower (although the maximum time is about 1/3 the single GPU case). Perhaps there is a better way to partition the data/computation than I'm doing here?

@maleadt maleadt force-pushed the master branch 21 times, most recently from 7a2b21f to 55b8716 Compare July 29, 2021 13:11
@amontoison
Copy link
Member

Nice examples !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants