-
Notifications
You must be signed in to change notification settings - Fork 47
Use an appropriate amount of threads in unified memory example #583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Got a CI failure and it may be due to the crazy number of threads and groups launched
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/examples/unified_memory.jl b/examples/unified_memory.jl
index 3093db36..98866464 100644
--- a/examples/unified_memory.jl
+++ b/examples/unified_memory.jl
@@ -37,7 +37,7 @@ arr_cpu .= pi
Metal.@allowscalar @test arr_mtl[1] == Float32(pi)
# Now launch a kernel altering the Metal array
-Metal.@sync @metal threads=16 groups=16 simple_kernel(arr_mtl)
+Metal.@sync @metal threads = 16 groups = 16 simple_kernel(arr_mtl)
# These changes are reflected in the wrapped CPU array
synchronize() |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #583 +/- ##
=======================================
Coverage 80.73% 80.73%
=======================================
Files 61 61
Lines 2657 2657
=======================================
Hits 2145 2145
Misses 512 512 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Benchmark suite | Current: 31f4b6d | Previous: fab6fc2 | Ratio |
---|---|---|---|
private array/construct |
24694.333333333332 ns |
24708.333333333332 ns |
1.00 |
private array/broadcast |
462541 ns |
468458 ns |
0.99 |
private array/random/randn/Float32 |
757583 ns |
793417 ns |
0.95 |
private array/random/randn!/Float32 |
626145.5 ns |
636333 ns |
0.98 |
private array/random/rand!/Int64 |
569375 ns |
573208 ns |
0.99 |
private array/random/rand!/Float32 |
599458 ns |
603208 ns |
0.99 |
private array/random/rand/Int64 |
734041.5 ns |
775270.5 ns |
0.95 |
private array/random/rand/Float32 |
639520.5 ns |
626292 ns |
1.02 |
private array/copyto!/gpu_to_gpu |
658667 ns |
642833 ns |
1.02 |
private array/copyto!/cpu_to_gpu |
694791 ns |
826459 ns |
0.84 |
private array/copyto!/gpu_to_cpu |
815916 ns |
662500 ns |
1.23 |
private array/accumulate/1d |
1323875 ns |
1350959 ns |
0.98 |
private array/accumulate/2d |
1396708.5 ns |
1396625 ns |
1.00 |
private array/iteration/findall/int |
2093187.5 ns |
2086458 ns |
1.00 |
private array/iteration/findall/bool |
1843000 ns |
1836208 ns |
1.00 |
private array/iteration/findfirst/int |
1712125 ns |
1709417 ns |
1.00 |
private array/iteration/findfirst/bool |
1665187.5 ns |
1664708 ns |
1.00 |
private array/iteration/scalar |
3884000 ns |
3895500 ns |
1.00 |
private array/iteration/logical |
3196813 ns |
3194188 ns |
1.00 |
private array/iteration/findmin/1d |
1765750 ns |
1768916.5 ns |
1.00 |
private array/iteration/findmin/2d |
1348291 ns |
1362729 ns |
0.99 |
private array/reductions/reduce/1d |
1038750 ns |
1040771 ns |
1.00 |
private array/reductions/reduce/2d |
665145.5 ns |
668666 ns |
0.99 |
private array/reductions/mapreduce/1d |
1044000 ns |
1053500 ns |
0.99 |
private array/reductions/mapreduce/2d |
659792 ns |
671333 ns |
0.98 |
private array/permutedims/4d |
2497666 ns |
2545250 ns |
0.98 |
private array/permutedims/2d |
1020083.5 ns |
1010417 ns |
1.01 |
private array/permutedims/3d |
1597812.5 ns |
1577583 ns |
1.01 |
private array/copy |
613416 ns |
597583 ns |
1.03 |
latency/precompile |
9740084292 ns |
9625622625 ns |
1.01 |
latency/ttfp |
3751656542 ns |
3681887084 ns |
1.02 |
latency/import |
1266783021 ns |
1253500083 ns |
1.01 |
integration/metaldevrt |
715291 ns |
712750 ns |
1.00 |
integration/byval/slices=1 |
1600709 ns |
1515459 ns |
1.06 |
integration/byval/slices=3 |
11774584 ns |
9046333 ns |
1.30 |
integration/byval/reference |
1624104 ns |
1617542 ns |
1.00 |
integration/byval/slices=2 |
2604292 ns |
2591083 ns |
1.01 |
kernel/indexing |
485250 ns |
479250 ns |
1.01 |
kernel/indexing_checked |
479833 ns |
476500 ns |
1.01 |
kernel/launch |
8000 ns |
9743.166666666668 ns |
0.82 |
metal/synchronization/stream |
14459 ns |
14875 ns |
0.97 |
metal/synchronization/context |
14875 ns |
15083 ns |
0.99 |
shared array/construct |
24736.166666666668 ns |
25194.5 ns |
0.98 |
shared array/broadcast |
460209 ns |
462792 ns |
0.99 |
shared array/random/randn/Float32 |
823416 ns |
782000 ns |
1.05 |
shared array/random/randn!/Float32 |
642291 ns |
641834 ns |
1.00 |
shared array/random/rand!/Int64 |
582208 ns |
579584 ns |
1.00 |
shared array/random/rand!/Float32 |
603937.5 ns |
594875 ns |
1.02 |
shared array/random/rand/Int64 |
782354.5 ns |
763250 ns |
1.03 |
shared array/random/rand/Float32 |
610750 ns |
614083 ns |
0.99 |
shared array/copyto!/gpu_to_gpu |
84125 ns |
83083 ns |
1.01 |
shared array/copyto!/cpu_to_gpu |
85916 ns |
83459 ns |
1.03 |
shared array/copyto!/gpu_to_cpu |
82458 ns |
82458 ns |
1 |
shared array/accumulate/1d |
1347375 ns |
1364875 ns |
0.99 |
shared array/accumulate/2d |
1400042 ns |
1400125 ns |
1.00 |
shared array/iteration/findall/int |
1788041.5 ns |
1836791.5 ns |
0.97 |
shared array/iteration/findall/bool |
1579000 ns |
1594000 ns |
0.99 |
shared array/iteration/findfirst/int |
1411083 ns |
1402333 ns |
1.01 |
shared array/iteration/findfirst/bool |
1524542 ns |
1378916.5 ns |
1.11 |
shared array/iteration/scalar |
156084 ns |
160334 ns |
0.97 |
shared array/iteration/logical |
3021458 ns |
3001833 ns |
1.01 |
shared array/iteration/findmin/1d |
1472562.5 ns |
1475937.5 ns |
1.00 |
shared array/iteration/findmin/2d |
1371833 ns |
1372979 ns |
1.00 |
shared array/reductions/reduce/1d |
741291.5 ns |
719291 ns |
1.03 |
shared array/reductions/reduce/2d |
668917 ns |
673937.5 ns |
0.99 |
shared array/reductions/mapreduce/1d |
746729.5 ns |
752104 ns |
0.99 |
shared array/reductions/mapreduce/2d |
671375 ns |
669166.5 ns |
1.00 |
shared array/permutedims/4d |
2531334 ns |
2524375 ns |
1.00 |
shared array/permutedims/2d |
1039875 ns |
1021916.5 ns |
1.02 |
shared array/permutedims/3d |
1604916 ns |
1572708 ns |
1.02 |
shared array/copy |
246791 ns |
245041 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
Split out 602e4f4 to merge earlier since it was also encountered in another PR: https://buildkite.com/julialang/metal-dot-jl/builds/1881#0196445f-e2c3-4961-9eb1-6819b6b2adab
I don't actually know if this will fix the issue, but this is something that should be fixed regardless