RadixN: Like Radix4, but supports size 2,3,4,5,6, and 7 cross-FFTs all in the same instance #132

ejmahler · 2024-02-06T01:08:56Z

This PR implements a new FFT algorithm called RadixN, and integrates it into the planner. It efficiently handles FFT sizes containing small prime factors. So far, it's scalar-only, but I intend to port it over to the SIMD implementations next.

The key insight here is that even though we use the same cross-FFT size at every "level" of Radix4, there's no intrinsic limitation that forces us to do so - that's just what we need for powers of four, so that's what radix4 does.

RadixN removes this restriction, allowing every cross-FFT to have a different butterfly size. I picked 2,3,4,5,6, and 7 as the initial butterflies, but if it improves performance to add 9 or 10 or 11 for example, I'm open to adding it. If we add a lot of composite sizes, I suspect there will be a combinatoric explosion of complexity of which overlapping sets of factors to choose from (Ie, if we have 2 * 3 * 5, we could collapse that to 3 * 10, or 6 * 5, or 2 * 15, which is fastest? it gets messy really fast, so if we start adding more composites than we have now i'd like it to come with an architecture for managing that complexity.

RadixN can handle factors larger than its largest butterfly length, by using those larger factors as its base FFT length. For example, RadixN can gracefully handle something like 4 * 9 * 5 * 7 * 101, by processing the 4 * 9 * 5 * 7 internally, and using 101 as the base FFT size.

Benchmarking shows that RadixN is marginally worse than Radix4 at power of two sizes, as you might expect given that it has more branches, etc. So the planner still uses Radix4 when computing 4*k * base, and still tries to maneuver the base to make 4^k happen.

Where the performance of RadixN really shines is in collections of small factors - For 44100 (2^2 * 3^2 * 5^2 * 7^2), we get a 60% improvement:

test bench_radixn_44100_planned                      ... bench:     741,795 ns/iter (+/- 38,680)
test bench_radixn_44100_radixn                       ... bench:     453,662 ns/iter (+/- 12,686)

For 48000 (2^7 * 3 * 5^3) we get a 45% improvement:

test bench_radixn_48000_planned                      ... bench:     685,025 ns/iter (+/- 28,818)
test bench_radixn_48000_radixn                       ... bench:     469,620 ns/iter (+/- 11,005)

These improvements come from eliminating the overhead associated with the multiple transposes that MixedRadix does. I was inspired by a comment from #65 - "With this shuffler, there is no need to use mixer radix for long ffts, since radix4 is now the fastest also there." I remembered this quote, and wondered - if it's true for powers of two, why wouldn't it also be true for non powers of two?

I plan to do one more pass of cleanup and clarity before submitting, and then like I said I'll port it over to the SIMD implementations.

ejmahler · 2024-02-06T01:16:27Z

error[E0658]: discriminants on non-unit variants are experimental
  --> src/algorithm/radixn.rs:33:30
   |
33 |     Factor7(Butterfly7<T>) = 7,
   |                              ^
   |
   = note: see issue #60553 <https://github.com/rust-lang/rust/issues/60553> for more information

Ah, beans. This has been stabilized since 1.61 as shown by the stable build passing, but I'll have to check when.

ejmahler added 2 commits February 5, 2024 16:48

Implemented RadixN: Like radix4 but for mixed cross-FFTs

b1c4154

Remove stale code

5695fb3

ejmahler added 4 commits February 6, 2024 08:19

remove explicit enum discriminants to unblock the build

4c723ea

Re-added function that appeared unused, and re-added default features

72eebbf

Removed another non-1.61 feature

b861dde

RadixN Cleanup

aac920b

ejmahler merged commit 01fa5c8 into master Feb 7, 2024
18 checks passed

ejmahler deleted the radixn branch February 7, 2024 02:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RadixN: Like Radix4, but supports size 2,3,4,5,6, and 7 cross-FFTs all in the same instance #132

RadixN: Like Radix4, but supports size 2,3,4,5,6, and 7 cross-FFTs all in the same instance #132

ejmahler commented Feb 6, 2024 •

edited

Loading

ejmahler commented Feb 6, 2024

RadixN: Like Radix4, but supports size 2,3,4,5,6, and 7 cross-FFTs all in the same instance #132

RadixN: Like Radix4, but supports size 2,3,4,5,6, and 7 cross-FFTs all in the same instance #132

Conversation

ejmahler commented Feb 6, 2024 • edited Loading

ejmahler commented Feb 6, 2024

ejmahler commented Feb 6, 2024 •

edited

Loading