Skip to content

Commit

Permalink
Distributed tridiagonal Fourier solver (#3689)
Browse files Browse the repository at this point in the history
* some changes

* bugfix

* bugfix

* bugfixxed

* another bugfix

* compute_diffusivities!

* required halo size

* all fixed

* shorten line

* fix comment

* remove abbreviation

* remove unused functions

* better explanation of the MPI tag

* Update src/ImmersedBoundaries/active_cells_map.jl

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

* Update src/Solvers/batched_tridiagonal_solver.jl

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

* change name

* docstring

* name change on rank

* interior active cells

* calculate -> compute

* fixed tests

* do not compute momentum in prescribed velocities

* DistributedComputations

* DistributedComputations part #2

* bugfix

* comment

* starting tests

* test the ffts

* bugfix

* small change

* add transpose test

* MPI.VBuffer

* fixed these tests for the moment

* using

* transpose

* fixed the distributed FFT tests

* not yet tridiagonal

* fulfill requirements

* fix doctest

* add distributed script

* bugfix

* update to current syntax

* couple of changes

* more instructions

* remove pencilstuff

* at least precompiles

* modify test

* simplify

* run test

* correct comment

* bugfix

* grammar

* more comments

* fix tests

* new commit

* fixing the injection

* bugfix distributed

* new syntax

* comment

* comment

* couple of TODOs

* comment

* distributed hydrostatic

* added hydrostatic simulation

* fixed tests

* small change

* testing also regression on nonhydrostatic

* remove pencilarrays

* some small, bugfixes

* small bugfix

* new manifest

* switch to on_architecture

* test an hypothesis

* update the ocean large eddy regression test

* correct rayleigh benard regression

* correct thermal regression test

* more bugfixes

* bugfixes for the regression tests

* some corrections

* another bugfix for regression

* test quickly on gpus

* define fallback for reconstruct_global_grid

* gpu tests

* last bugfix to distribute regression tests

* do not do the thermal bubble for the moment

* adding gpu distributed solvers tests

* finally it works for bounded

* make sure everything is on the CPU

* make sure everything is on the CPU

* test distributed poisson also on the GPU

* at least the poisson solve works, next the regression tests

* another bug in the tests

* some docs

* some naming changes

* change file naming

* bugfix

* bugfix distributed regression

* wrong indices

* leave out other tests for the moment

* make sure we use correct archs for regression

* non hydrostatic regression archs

* remove test file

* this should make all test pass

* archs outside

* move `archs` to the right position

* add comments

* fix doctests

* small name change

* Update src/DistributedComputations/distributed_fft_based_poisson_solver.jl

Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>

* change to named tuple

* try it out

* allow Flat directions and FieldTimeSeries

* implementing child architecture for grids

* solve_poisson_in_spectral_space!

* adding some docs

* docs formatting

* use julia v1.10.4

* resolve deps

* use julia v1.10.4

* bump patch release

* formatting

* formatting

* formatting

* adapting manifest

* should run

* test all

* bugfix

* bugfix

* some progress

* bugfix

* fix assemble coordinate

* loos like it's working!

* fix docs

* comments

* comments

* rayeigh benard stretched test

* somoe validation

* fix `partition_coordinate`

* spit out error + fixes to docs

* more docfixes

* yet another docfix

* make sure validation example works

* add a couple of comments

* Update validation/distributed_simulations/distributed_nonhydrostatic_turbulence.jl

Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>

* retry the build

* Update src/DistributedComputations/distributed_fft_based_poisson_solver.jl

Co-authored-by: Tomas Chor <tomaschor@gmail.com>

* Update src/DistributedComputations/distributed_fft_based_poisson_solver.jl

Co-authored-by: Tomas Chor <tomaschor@gmail.com>

* Update src/DistributedComputations/distributed_transpose.jl

Co-authored-by: Tomas Chor <tomaschor@gmail.com>

* Update test/test_distributed_transpose.jl

Co-authored-by: Tomas Chor <tomaschor@gmail.com>

* docs changes

* adding a validation to the configuration

* add a configuration validation

* remove the `ArgumentError`

* alignment

* clarify a bit the sizes in `TransposableField`

* fixed tests

* fix tests

* make names very explicit

* some more explanation

* add more docstring

* bugfix

* address a couple of comments

* add link to MPI docs

* bump to 0.92

* version 91.6

* retry the tests

* few tweaks in the docstring

* Update distributed_fft_based_poisson_solver.jl

* some comments to the tridiagonal solver

* test the new solver

* bugfix

* add configuration validation

* change order of operations

* formatting

* Update src/DistributedComputations/distributed_fft_tridiagonal_solver.jl

Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>

* changed error message

* add new comments

* better explanation

* Update src/DistributedComputations/distributed_fft_based_poisson_solver.jl

Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>

* some typo

* typo

* retry build

* Update src/DistributedComputations/distributed_fft_tridiagonal_solver.jl

Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>

* Update src/DistributedComputations/distributed_fft_tridiagonal_solver.jl

Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>

* Update src/DistributedComputations/distributed_fft_tridiagonal_solver.jl

Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>

* Update src/DistributedComputations/distributed_fft_tridiagonal_solver.jl

Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>

* address comments

* add functionality for unstretched solver

* other additions

* no more need for `XYRegularGrid` and so on!

* some bugfixes

* another bugfix

* bugfixes

* formatting

* fix tests

* Update src/DistributedComputations/distributed_fft_tridiagonal_solver.jl

Co-authored-by: Tomas Chor <tomaschor@gmail.com>

* Update src/DistributedComputations/distributed_fft_tridiagonal_solver.jl

Co-authored-by: Tomas Chor <tomaschor@gmail.com>

* Update src/DistributedComputations/distributed_grids.jl

Co-authored-by: Tomas Chor <tomaschor@gmail.com>

* Update src/DistributedComputations/partition_assemble.jl

Co-authored-by: Tomas Chor <tomaschor@gmail.com>

* Update src/DistributedComputations/partition_assemble.jl

Co-authored-by: Tomas Chor <tomaschor@gmail.com>

* another bugfix

* change emojii for distributed pipeline

* another bugfix

* another typo fix

---------

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Co-authored-by: Tomas Chor <tomaschor@gmail.com>
  • Loading branch information
4 people authored Aug 9, 2024
1 parent 1605cf5 commit 0e45c13
Show file tree
Hide file tree
Showing 13 changed files with 510 additions and 64 deletions.
4 changes: 2 additions & 2 deletions .buildkite/distributed/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ steps:
slurm_mem: 120G
slurm_ntasks: 4

- label: "🕺 gpu distributed hydrostatic model tests"
- label: "🦏 gpu distributed hydrostatic model tests"
key: "distributed_hydrostatic_model_gpu"
env:
TEST_GROUP: "distributed_hydrostatic_model"
Expand All @@ -97,7 +97,7 @@ steps:
slurm_ntasks: 4
slurm_gpus_per_task: 1

- label: "🤺 cpu distributed nonhydrostatic regression"
- label: "🦍 cpu distributed nonhydrostatic regression"
key: "distributed_nonhydrostatic_regression_cpu"
env:
TEST_GROUP: "distributed_nonhydrostatic_regression"
Expand Down
1 change: 1 addition & 0 deletions src/DistributedComputations/DistributedComputations.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,6 @@ include("transposable_field.jl")
include("distributed_transpose.jl")
include("plan_distributed_transforms.jl")
include("distributed_fft_based_poisson_solver.jl")
include("distributed_fft_tridiagonal_solver.jl")

end # module
2 changes: 1 addition & 1 deletion src/DistributedComputations/distributed_architectures.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ end
Return `Partition` representing the division of a domain in
the `x` (first), `y` (second) and `z` (third) dimension
Keyword arguments:
Keyword arguments:
==================
- `x`: partitioning of the first dimension
Expand Down
41 changes: 22 additions & 19 deletions src/DistributedComputations/distributed_fft_based_poisson_solver.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import FFTW

using CUDA: @allowscalar
using Oceananigans.Grids: XYZRegularRG
using Oceananigans.Grids: XYZRegularRG, XYRegularRG, XZRegularRG, YZRegularRG

import Oceananigans.Solvers: poisson_eigenvalues, solve!
import Oceananigans.Architectures: architecture
Expand Down Expand Up @@ -60,9 +60,9 @@ In the algorithm below, the first dimension is always the local dimension. In ou
1. `storage.zfield`, partitioned over ``(x, y)`` is initialized with the `rhs` that is ``b``.
2. Transform along ``z``.
3 Transpose + communicate to `storage.yfield` partitioned into `(Rx, Ry)` processes in ``(x, z)``.
3 Transpose `storage.zfield` + communicate to `storage.yfield` partitioned into `(Rx, Ry)` processes in ``(x, z)``.
4. Transform along ``y``.
5. Transpose + communicate to `storage.xfield` partitioned into `(Rx, Ry)` processes in ``(y, z)``.
5. Transpose `storage.yfield` + communicate to `storage.xfield` partitioned into `(Rx, Ry)` processes in ``(y, z)``.
6. Transform in ``x``.
At this point the three in-place forward transforms are complete, and we
Expand Down Expand Up @@ -91,8 +91,8 @@ Restrictions
"""
function DistributedFFTBasedPoissonSolver(global_grid, local_grid, planner_flag=FFTW.PATIENT)

validate_global_grid(global_grid)
validate_configuration(global_grid, local_grid)
validate_poisson_solver_distributed_grid(global_grid)
validate_poisson_solver_configuration(global_grid, local_grid)

FT = Complex{eltype(local_grid)}

Expand Down Expand Up @@ -188,38 +188,41 @@ end
end

# TODO: bring up to speed the PCG to remove this error
validate_global_grid(global_grid) =
validate_poisson_solver_distributed_grid(global_grid) =
throw("Grids other than the RectilinearGrid are not supported in the Distributed NonhydrostaticModels")

function validate_global_grid(global_grid::RectilinearGrid)
function validate_poisson_solver_distributed_grid(global_grid::RectilinearGrid)
TX, TY, TZ = topology(global_grid)

if (TY == Bounded && TZ == Periodic) || (TX == Bounded && TY == Periodic) || (TX == Bounded && TZ == Periodic)
throw("NonhydrostaticModels on Distributed grids do not support topology ($TX, $TY, $TZ) at the moment.
TZ Periodic requires also TY and TX to be Periodic, while TY Periodic requires also TX to be Periodic.
Please rotate the domain to obtain the required topology")
throw("Distributed Poisson solvers do not support grids with topology ($TX, $TY, $TZ) at the moment.
A Periodic z-direction requires also the y- and and x-directions to be Periodic, while a Periodic y-direction requires also
the x-direction to be Periodic.")
end

if !(global_grid isa XYZRegularRG)
throw("Stretched directions are not supported with distributed grids at the moment.")
if !(global_grid isa YZRegularRG) && !(global_grid isa XYRegularRG) && !(global_grid isa XZRegularRG)
throw("The provided grid is stretched in directions $(stretched_dimensions(global_grid)).
A distributed Poisson solver supports only RectilinearGrids that have variably-spaced cells in at most one direction.")
end

return nothing
end

function validate_configuration(global_grid, local_grid)
function validate_poisson_solver_configuration(global_grid, local_grid)

# We don't support distributing anything in z.
Rz = architecture(local_grid).ranks[3]
Rz == 1 || throw("Non-singleton ranks in the vertical are not supported by DistributedFFTBasedPoissonSolver.")
Rx, Ry, Rz = architecture(local_grid).ranks
Rz == 1 || throw("Non-singleton ranks in the vertical are not supported by distributed Poisson solvers.")

# Limitation of the current implementation (see the docstring)
if global_grid.Nz % architecture(local_grid).ranks[2] != 0
throw("The number of ranks in the y direction must divide Nz. See the docstring for more information.")
if global_grid.Nz % Ry != 0
throw("The number of ranks in the y-direction are $(Ry) with Nz = $(global_grid.Nz) cells in the z-direction.
The distributed Poisson solver requires that the number of ranks in the y-direction divide Nz.")
end

if global_grid.Ny % architecture(local_grid).ranks[1] != 0
throw("The number of ranks in the x direction must divide Ny. See the docstring for more information.")
if global_grid.Ny % Rx != 0
throw("The number of ranks in the y-direction are $(Rx) with Ny = $(global_grid.Ny) cells in the y-direction.
The distributed Poisson solver requires that the number of ranks in the x-direction divide Ny.")
end

return nothing
Expand Down
Loading

0 comments on commit 0e45c13

Please sign in to comment.