Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed tridiagonal Fourier solver #3689

Merged
merged 834 commits into from
Aug 9, 2024

Conversation

simone-silvestri
Copy link
Collaborator

This PR introduces a distributed pressure solver for grids that are stretched in one direction. The algorithm implemented here is the same as the one described in #2538 but it uses the FFT and transposes built in #3279

@glwagner
Copy link
Member

glwagner commented Aug 9, 2024

Just a question about implementation:
It looks like #2538 implemented just one solver, with an optional tridiagonal component, is that right. Why does this PR take a different approach? Wouldn't using a single solver result in less code / duplication of transform logic?

Here, the approach follows the serial implementation quite closely, where we have an FFT solver and a FourierTridiagonal solver. Additionally, the fields of the solvers are tailored to their task; for example, there is no tridiagonal solver in the pure FFT solver. You might argue that we just put nothing if we don't need it, but then we have more "ambiguous" fields like eigenvalues that are not needed in the tridiagonal solver and source_term that is not needed in the FFT. I am not convinced that a single solver would lead to less (or cleaner) code. Mostly because the underlying code (constructor and solve! functions) is compact and slender enough to justify writing individual functions for different grids (the number of lines in the docstring for the DistributedFourierTridiagonalPoissonSolver is comparable to the code related to the solver). I think this improves the interpretability of the algorithm. Maybe an improvement would be writing a unified constructor that would spit out the different solvers.

  1. It might make sense to split the constructor into two parts, so that we can build a tridiagonal solver even when all three directions are regular. That could be useful for testing, for example.

I have added this capability by passing the stretched_direction kwarg.

Also note that in terms of operation count the tridiagonal solve is cheaper than FFT...

I think, all things considered, the mixed FFT / tridiagonal solve will have basically the same computational cost as the pure FFT solve only for a stretched x direction. The additional transposes required for a y or stretched z direction will completely dominate the cost of the actual operations.

As an example, this is a slab decomposition with a fairly big grid (512 x 256^2) split on 2 GPUs on Tartarus 311333172-43dba752-a91f-4b33-8ade-5a6ec57c982b The AlltoAllv is the dominant cost, while the FFT (in between the two transposes) is quite irrelevant. In the near future, I'll perform scaling tests on Perlmutter, which has a much better network, so it might be that (even if I think it's unlikely) the cost is not all communication after all.

Just to clarify --- the mixed tridiagonal + FFT solver also needs eigenvalues, doesn't it?

@simone-silvestri
Copy link
Collaborator Author

Just to clarify --- the mixed tridiagonal + FFT solver also needs eigenvalues, doesn't it?

yep, it does but they are "embedded" in the diagonal terms of the tridiagonal solver. So to clarify, there is no need for an additional eigenvalue field because these are already included in bathced_tridiagonal_solver

@simone-silvestri simone-silvestri merged commit 0e45c13 into main Aug 9, 2024
46 checks passed
@simone-silvestri simone-silvestri deleted the ss/distributed-tridiagonal-solve branch August 9, 2024 20:23
@tomchor
Copy link
Collaborator

tomchor commented Aug 9, 2024

I meant to ask this on the other PR but forgot: how are the scalings for this algorithm? Any recommendations when using GPU partitioning on nonhydrostatic models?

@simone-silvestri
Copy link
Collaborator Author

A long time ago, I did some scaling tests of the pure FFT algorithm. These were the results

Screenshot 2024-08-10 at 2 57 20 PM

I will probably redo the scaling test later on after the summer.
In general, always use slab partitioning if you can because you avoid one transposition.
This might lead to larger halo to domain ratio, but the fill halo, contrarily to the transpose, is hidden so it should still be better to have a slab partitioning then a pencil one.

@tomchor
Copy link
Collaborator

tomchor commented Aug 10, 2024

Thanks that's super useful info!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants