-
Notifications
You must be signed in to change notification settings - Fork 51
Improvements to large buffer tests #735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/largebroadcast.jl b/test/largebroadcast.jl
index be0bf22b..b037458b 100644
--- a/test/largebroadcast.jl
+++ b/test/largebroadcast.jl
@@ -1,13 +1,15 @@
const N = Int(typemax(UInt32)) + 1
const T = Int8
-@testset "len = $n" for n in ((N÷2) - 4, (N÷2), (N÷2) + 4, N - 1024, N - 3, N - 1, N, N + 4)
+@testset "len = $n" for n in ((N ÷ 2) - 4, (N ÷ 2), (N ÷ 2) + 4, N - 1024, N - 3, N - 1, N, N + 4)
A = MtlArray{T}(undef, n)
# Known working method to zero out array
Metal.unsafe_fill!(device(A), pointer(A), T(0), n * sizeof(T); async = false)
- _dims = [(n,), (n, 1), (1, n), (n, 1, 1), (1, n, 1), (1, 1, n),
- (n, 1, 1, 1), (1, n, 1, 1), (1, 1, n, 1), (1, 1, 1, n),]
+ _dims = [
+ (n,), (n, 1), (1, n), (n, 1, 1), (1, n, 1), (1, 1, n),
+ (n, 1, 1, 1), (1, n, 1, 1), (1, 1, n, 1), (1, 1, 1, n),
+ ]
if n == 2^32
push!(_dims, (2^16, 2^16))
push!(_dims, (2^16, 2^8, 2^8))
@@ -17,7 +19,7 @@ const T = Int8
# These must be run first to ensure we test
# the unspecialized broadcast kernels
Metal._broadcast_shapes[CartesianIndices(dims)] = Metal.BROADCAST_SPECIALIZATION_THRESHOLD - 1
- unspec_val = T((i-1) * 2 + 1)
+ unspec_val = T((i - 1) * 2 + 1)
arr = reshape(A, dims)
arr .= unspec_val
@test all(==(unspec_val), arr) |
c5152f9 to
0104cca
Compare
0104cca to
f2a7fa1
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #735 +/- ##
==========================================
- Coverage 82.59% 82.28% -0.32%
==========================================
Files 62 62
Lines 2862 2873 +11
==========================================
Hits 2364 2364
- Misses 498 509 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Details
| Benchmark suite | Current: f2a7fa1 | Previous: f7b0829 | Ratio |
|---|---|---|---|
latency/precompile |
24994097291 ns |
25035410125 ns |
1.00 |
latency/ttfp |
2272859917 ns |
2272092167 ns |
1.00 |
latency/import |
1445333042 ns |
1444131708 ns |
1.00 |
integration/metaldevrt |
864354.5 ns |
870500 ns |
0.99 |
integration/byval/slices=1 |
1591292 ns |
1578834 ns |
1.01 |
integration/byval/slices=3 |
11237083 ns |
10274687 ns |
1.09 |
integration/byval/reference |
1567667 ns |
1566167 ns |
1.00 |
integration/byval/slices=2 |
2643125 ns |
2622959 ns |
1.01 |
kernel/indexing |
602292 ns |
625542 ns |
0.96 |
kernel/indexing_checked |
627458 ns |
615520.5 ns |
1.02 |
kernel/launch |
11625 ns |
11417 ns |
1.02 |
kernel/rand |
568041 ns |
566500 ns |
1.00 |
array/construct |
6500 ns |
6084 ns |
1.07 |
array/broadcast |
598167 ns |
602917 ns |
0.99 |
array/random/randn/Float32 |
968666.5 ns |
1001708 ns |
0.97 |
array/random/randn!/Float32 |
743125 ns |
750583 ns |
0.99 |
array/random/rand!/Int64 |
549833.5 ns |
551375 ns |
1.00 |
array/random/rand!/Float32 |
586208 ns |
587041 ns |
1.00 |
array/random/rand/Int64 |
787917 ns |
777958 ns |
1.01 |
array/random/rand/Float32 |
642333 ns |
590000 ns |
1.09 |
array/accumulate/Int64/1d |
1280229 ns |
1254458 ns |
1.02 |
array/accumulate/Int64/dims=1 |
1841125 ns |
1827292 ns |
1.01 |
array/accumulate/Int64/dims=2 |
2189750 ns |
2165896 ns |
1.01 |
array/accumulate/Int64/dims=1L |
11558895.5 ns |
11587604 ns |
1.00 |
array/accumulate/Int64/dims=2L |
9795208 ns |
9813792 ns |
1.00 |
array/accumulate/Float32/1d |
1125500 ns |
1121542 ns |
1.00 |
array/accumulate/Float32/dims=1 |
1566375 ns |
1398896.5 ns |
1.12 |
array/accumulate/Float32/dims=2 |
1890042 ns |
1890312.5 ns |
1.00 |
array/accumulate/Float32/dims=1L |
9798959 ns |
9780770.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
7258792 ns |
7265249.5 ns |
1.00 |
array/reductions/reduce/Int64/1d |
1519437.5 ns |
1357812.5 ns |
1.12 |
array/reductions/reduce/Int64/dims=1 |
1103250 ns |
1116917 ns |
0.99 |
array/reductions/reduce/Int64/dims=2 |
1153458.5 ns |
1188313 ns |
0.97 |
array/reductions/reduce/Int64/dims=1L |
1984917 ns |
1986958.5 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
4718354 ns |
4191792 ns |
1.13 |
array/reductions/reduce/Float32/1d |
929542 ns |
1028770.5 ns |
0.90 |
array/reductions/reduce/Float32/dims=1 |
831875 ns |
830375 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
925833.5 ns |
861042 ns |
1.08 |
array/reductions/reduce/Float32/dims=1L |
1558208.5 ns |
1318208 ns |
1.18 |
array/reductions/reduce/Float32/dims=2L |
1820875 ns |
1794083 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
1519479.5 ns |
1551042 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=1 |
1105292 ns |
1125167 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=2 |
1187312.5 ns |
1206208 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=1L |
2022833 ns |
1995875 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2L |
3579875 ns |
3621563 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
943917 ns |
1027916 ns |
0.92 |
array/reductions/mapreduce/Float32/dims=1 |
832833 ns |
830833 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
862708 ns |
862209 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
1327354.5 ns |
1332666 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
1803459 ns |
1842000 ns |
0.98 |
array/private/copyto!/gpu_to_gpu |
629709 ns |
632500 ns |
1.00 |
array/private/copyto!/cpu_to_gpu |
789708 ns |
780667 ns |
1.01 |
array/private/copyto!/gpu_to_cpu |
804625 ns |
787750 ns |
1.02 |
array/private/iteration/findall/int |
1556291 ns |
1564000 ns |
1.00 |
array/private/iteration/findall/bool |
1412146 ns |
1393667 ns |
1.01 |
array/private/iteration/findfirst/int |
2062333 ns |
2101875 ns |
0.98 |
array/private/iteration/findfirst/bool |
2041229.5 ns |
2049749.5 ns |
1.00 |
array/private/iteration/scalar |
4025000 ns |
4317166 ns |
0.93 |
array/private/iteration/logical |
2632459 ns |
2622584 ns |
1.00 |
array/private/iteration/findmin/1d |
2504666 ns |
2507959 ns |
1.00 |
array/private/iteration/findmin/2d |
1792750 ns |
1788250 ns |
1.00 |
array/private/copy |
568792 ns |
556437.5 ns |
1.02 |
array/shared/copyto!/gpu_to_gpu |
82916 ns |
83375 ns |
0.99 |
array/shared/copyto!/cpu_to_gpu |
82375 ns |
87084 ns |
0.95 |
array/shared/copyto!/gpu_to_cpu |
81792 ns |
82625 ns |
0.99 |
array/shared/iteration/findall/int |
1559042 ns |
1552250 ns |
1.00 |
array/shared/iteration/findall/bool |
1432083 ns |
1424958 ns |
1.01 |
array/shared/iteration/findfirst/int |
1665458 ns |
1688500 ns |
0.99 |
array/shared/iteration/findfirst/bool |
1641209 ns |
1642438 ns |
1.00 |
array/shared/iteration/scalar |
207167 ns |
205792 ns |
1.01 |
array/shared/iteration/logical |
2246584 ns |
2261000 ns |
0.99 |
array/shared/iteration/findmin/1d |
2123958 ns |
2133500 ns |
1.00 |
array/shared/iteration/findmin/2d |
1792083 ns |
1797312.5 ns |
1.00 |
array/shared/copy |
235583 ns |
240542 ns |
0.98 |
array/permutedims/4d |
2399833.5 ns |
2390688 ns |
1.00 |
array/permutedims/2d |
1179208 ns |
1170542 ns |
1.01 |
array/permutedims/3d |
1686792 ns |
1675145.5 ns |
1.01 |
metal/synchronization/stream |
19416 ns |
17291 ns |
1.12 |
metal/synchronization/context |
20291 ns |
17667 ns |
1.15 |
This comment was automatically generated by workflow using github-action-benchmark.
Depends on #728 and a way to select runners with at least 16 GB memory