Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: const filtering strat is size dependent #891

Merged
merged 3 commits into from
Dec 27, 2024

Conversation

alexander-camuto
Copy link
Collaborator

Currently constant filtering uses parallelism like a hammer but for einsum operations where dot and constant filtering may be called repeatedly for MANY small arrays (eg. einsum ijk,ijk->ij where input dims are [1024,1024,2] we call it 1024*1024 times for arrays of size 2 which, induces terrible overhead from spinning up threads.

Instead we should use sequential search methods for smaller arrays, lest we want to avoid this overhead.

Here we also leverage benchmarks to get a rough threshold where we should switch from sequential to a parallel search for constants. On a M3 Max the results are below, giving us a rough crossover point of 1mil elements.

zero_finding/sequential_1000
                        time:   [1.0408 µs 1.0438 µs 1.0495 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
zero_finding/parallel_1000
                        time:   [73.068 µs 77.557 µs 83.013 µs]
zero_finding/chunked_parallel_1000
                        time:   [90.572 µs 94.148 µs 98.686 µs]
zero_finding/sequential_10000
                        time:   [9.1284 µs 9.1548 µs 9.1806 µs]
zero_finding/parallel_10000
                        time:   [189.68 µs 197.12 µs 204.70 µs]
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) low mild
zero_finding/chunked_parallel_10000
                        time:   [205.38 µs 208.58 µs 212.23 µs]
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low severe
  1 (10.00%) high mild
zero_finding/sequential_100000
                        time:   [111.68 µs 113.21 µs 115.09 µs]
zero_finding/parallel_100000
                        time:   [533.02 µs 537.91 µs 543.24 µs]
zero_finding/chunked_parallel_100000
                        time:   [528.67 µs 533.78 µs 539.20 µs]
zero_finding/sequential_131072
                        time:   [154.65 µs 161.51 µs 166.73 µs]
zero_finding/parallel_131072
                        time:   [578.23 µs 582.05 µs 585.90 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
zero_finding/chunked_parallel_131072
                        time:   [577.69 µs 581.33 µs 585.17 µs]
zero_finding/sequential_1000000
                        time:   [1.4908 ms 1.5081 ms 1.5198 ms]
zero_finding/parallel_1000000
                        time:   [1.1452 ms 1.1618 ms 1.1837 ms]
zero_finding/chunked_parallel_1000000
                        time:   [1.1548 ms 1.1711 ms 1.1889 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
zero_finding/sequential_10000000
                        time:   [15.812 ms 16.083 ms 16.453 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
zero_finding/parallel_10000000
                        time:   [6.0287 ms 6.1100 ms 6.1853 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
zero_finding/chunked_parallel_10000000
                        time:   [5.9212 ms 5.9562 ms 5.9955 ms]
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

@alexander-camuto alexander-camuto marked this pull request as ready for review December 27, 2024 14:43
@alexander-camuto alexander-camuto merged commit caa6ef8 into main Dec 27, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant