Skip to content

Conversation

christiangnrd
Copy link
Member

Split out 602e4f4 to merge earlier since it was also encountered in another PR: https://buildkite.com/julialang/metal-dot-jl/builds/1881#0196445f-e2c3-4961-9eb1-6819b6b2adab

I don't actually know if this will fix the issue, but this is something that should be fixed regardless

Got a CI failure and it may be due to the crazy number of threads and groups launched
Copy link
Contributor

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.
diff --git a/examples/unified_memory.jl b/examples/unified_memory.jl
index 3093db36..98866464 100644
--- a/examples/unified_memory.jl
+++ b/examples/unified_memory.jl
@@ -37,7 +37,7 @@ arr_cpu .= pi
 Metal.@allowscalar @test arr_mtl[1] == Float32(pi)
 
 # Now launch a kernel altering the Metal array
-Metal.@sync @metal threads=16 groups=16 simple_kernel(arr_mtl)
+Metal.@sync @metal threads = 16 groups = 16 simple_kernel(arr_mtl)
 
 # These changes are reflected in the wrapped CPU array
 synchronize()

Copy link

codecov bot commented Apr 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.73%. Comparing base (fab6fc2) to head (31f4b6d).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #583   +/-   ##
=======================================
  Coverage   80.73%   80.73%           
=======================================
  Files          61       61           
  Lines        2657     2657           
=======================================
  Hits         2145     2145           
  Misses        512      512           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Benchmark suite Current: 31f4b6d Previous: fab6fc2 Ratio
private array/construct 24694.333333333332 ns 24708.333333333332 ns 1.00
private array/broadcast 462541 ns 468458 ns 0.99
private array/random/randn/Float32 757583 ns 793417 ns 0.95
private array/random/randn!/Float32 626145.5 ns 636333 ns 0.98
private array/random/rand!/Int64 569375 ns 573208 ns 0.99
private array/random/rand!/Float32 599458 ns 603208 ns 0.99
private array/random/rand/Int64 734041.5 ns 775270.5 ns 0.95
private array/random/rand/Float32 639520.5 ns 626292 ns 1.02
private array/copyto!/gpu_to_gpu 658667 ns 642833 ns 1.02
private array/copyto!/cpu_to_gpu 694791 ns 826459 ns 0.84
private array/copyto!/gpu_to_cpu 815916 ns 662500 ns 1.23
private array/accumulate/1d 1323875 ns 1350959 ns 0.98
private array/accumulate/2d 1396708.5 ns 1396625 ns 1.00
private array/iteration/findall/int 2093187.5 ns 2086458 ns 1.00
private array/iteration/findall/bool 1843000 ns 1836208 ns 1.00
private array/iteration/findfirst/int 1712125 ns 1709417 ns 1.00
private array/iteration/findfirst/bool 1665187.5 ns 1664708 ns 1.00
private array/iteration/scalar 3884000 ns 3895500 ns 1.00
private array/iteration/logical 3196813 ns 3194188 ns 1.00
private array/iteration/findmin/1d 1765750 ns 1768916.5 ns 1.00
private array/iteration/findmin/2d 1348291 ns 1362729 ns 0.99
private array/reductions/reduce/1d 1038750 ns 1040771 ns 1.00
private array/reductions/reduce/2d 665145.5 ns 668666 ns 0.99
private array/reductions/mapreduce/1d 1044000 ns 1053500 ns 0.99
private array/reductions/mapreduce/2d 659792 ns 671333 ns 0.98
private array/permutedims/4d 2497666 ns 2545250 ns 0.98
private array/permutedims/2d 1020083.5 ns 1010417 ns 1.01
private array/permutedims/3d 1597812.5 ns 1577583 ns 1.01
private array/copy 613416 ns 597583 ns 1.03
latency/precompile 9740084292 ns 9625622625 ns 1.01
latency/ttfp 3751656542 ns 3681887084 ns 1.02
latency/import 1266783021 ns 1253500083 ns 1.01
integration/metaldevrt 715291 ns 712750 ns 1.00
integration/byval/slices=1 1600709 ns 1515459 ns 1.06
integration/byval/slices=3 11774584 ns 9046333 ns 1.30
integration/byval/reference 1624104 ns 1617542 ns 1.00
integration/byval/slices=2 2604292 ns 2591083 ns 1.01
kernel/indexing 485250 ns 479250 ns 1.01
kernel/indexing_checked 479833 ns 476500 ns 1.01
kernel/launch 8000 ns 9743.166666666668 ns 0.82
metal/synchronization/stream 14459 ns 14875 ns 0.97
metal/synchronization/context 14875 ns 15083 ns 0.99
shared array/construct 24736.166666666668 ns 25194.5 ns 0.98
shared array/broadcast 460209 ns 462792 ns 0.99
shared array/random/randn/Float32 823416 ns 782000 ns 1.05
shared array/random/randn!/Float32 642291 ns 641834 ns 1.00
shared array/random/rand!/Int64 582208 ns 579584 ns 1.00
shared array/random/rand!/Float32 603937.5 ns 594875 ns 1.02
shared array/random/rand/Int64 782354.5 ns 763250 ns 1.03
shared array/random/rand/Float32 610750 ns 614083 ns 0.99
shared array/copyto!/gpu_to_gpu 84125 ns 83083 ns 1.01
shared array/copyto!/cpu_to_gpu 85916 ns 83459 ns 1.03
shared array/copyto!/gpu_to_cpu 82458 ns 82458 ns 1
shared array/accumulate/1d 1347375 ns 1364875 ns 0.99
shared array/accumulate/2d 1400042 ns 1400125 ns 1.00
shared array/iteration/findall/int 1788041.5 ns 1836791.5 ns 0.97
shared array/iteration/findall/bool 1579000 ns 1594000 ns 0.99
shared array/iteration/findfirst/int 1411083 ns 1402333 ns 1.01
shared array/iteration/findfirst/bool 1524542 ns 1378916.5 ns 1.11
shared array/iteration/scalar 156084 ns 160334 ns 0.97
shared array/iteration/logical 3021458 ns 3001833 ns 1.01
shared array/iteration/findmin/1d 1472562.5 ns 1475937.5 ns 1.00
shared array/iteration/findmin/2d 1371833 ns 1372979 ns 1.00
shared array/reductions/reduce/1d 741291.5 ns 719291 ns 1.03
shared array/reductions/reduce/2d 668917 ns 673937.5 ns 0.99
shared array/reductions/mapreduce/1d 746729.5 ns 752104 ns 0.99
shared array/reductions/mapreduce/2d 671375 ns 669166.5 ns 1.00
shared array/permutedims/4d 2531334 ns 2524375 ns 1.00
shared array/permutedims/2d 1039875 ns 1021916.5 ns 1.02
shared array/permutedims/3d 1604916 ns 1572708 ns 1.02
shared array/copy 246791 ns 245041 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@christiangnrd christiangnrd merged commit a5b56dc into JuliaGPU:main Apr 17, 2025
7 checks passed
@christiangnrd christiangnrd deleted the fewerthreads branch April 17, 2025 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants