Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RadixN: Like Radix4, but supports size 2,3,4,5,6, and 7 cross-FFTs all in the same instance #132

Merged
merged 6 commits into from
Feb 7, 2024

Conversation

ejmahler
Copy link
Owner

@ejmahler ejmahler commented Feb 6, 2024

This PR implements a new FFT algorithm called RadixN, and integrates it into the planner. It efficiently handles FFT sizes containing small prime factors. So far, it's scalar-only, but I intend to port it over to the SIMD implementations next.

The key insight here is that even though we use the same cross-FFT size at every "level" of Radix4, there's no intrinsic limitation that forces us to do so - that's just what we need for powers of four, so that's what radix4 does.

RadixN removes this restriction, allowing every cross-FFT to have a different butterfly size. I picked 2,3,4,5,6, and 7 as the initial butterflies, but if it improves performance to add 9 or 10 or 11 for example, I'm open to adding it. If we add a lot of composite sizes, I suspect there will be a combinatoric explosion of complexity of which overlapping sets of factors to choose from (Ie, if we have 2 * 3 * 5, we could collapse that to 3 * 10, or 6 * 5, or 2 * 15, which is fastest? it gets messy really fast, so if we start adding more composites than we have now i'd like it to come with an architecture for managing that complexity.

RadixN can handle factors larger than its largest butterfly length, by using those larger factors as its base FFT length. For example, RadixN can gracefully handle something like 4 * 9 * 5 * 7 * 101, by processing the 4 * 9 * 5 * 7 internally, and using 101 as the base FFT size.

Benchmarking shows that RadixN is marginally worse than Radix4 at power of two sizes, as you might expect given that it has more branches, etc. So the planner still uses Radix4 when computing 4*k * base, and still tries to maneuver the base to make 4^k happen.

Where the performance of RadixN really shines is in collections of small factors - For 44100 (2^2 * 3^2 * 5^2 * 7^2), we get a 60% improvement:

test bench_radixn_44100_planned                      ... bench:     741,795 ns/iter (+/- 38,680)
test bench_radixn_44100_radixn                       ... bench:     453,662 ns/iter (+/- 12,686)

For 48000 (2^7 * 3 * 5^3) we get a 45% improvement:

test bench_radixn_48000_planned                      ... bench:     685,025 ns/iter (+/- 28,818)
test bench_radixn_48000_radixn                       ... bench:     469,620 ns/iter (+/- 11,005)

These improvements come from eliminating the overhead associated with the multiple transposes that MixedRadix does. I was inspired by a comment from #65 - "With this shuffler, there is no need to use mixer radix for long ffts, since radix4 is now the fastest also there." I remembered this quote, and wondered - if it's true for powers of two, why wouldn't it also be true for non powers of two?

I plan to do one more pass of cleanup and clarity before submitting, and then like I said I'll port it over to the SIMD implementations.

@ejmahler
Copy link
Owner Author

ejmahler commented Feb 6, 2024

error[E0658]: discriminants on non-unit variants are experimental
  --> src/algorithm/radixn.rs:33:30
   |
33 |     Factor7(Butterfly7<T>) = 7,
   |                              ^
   |
   = note: see issue #60553 <https://github.com/rust-lang/rust/issues/60553> for more information

Ah, beans. This has been stabilized since 1.61 as shown by the stable build passing, but I'll have to check when.

@ejmahler ejmahler merged commit 01fa5c8 into master Feb 7, 2024
18 checks passed
@ejmahler ejmahler deleted the radixn branch February 7, 2024 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant