Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removes binary operation from Smagorinsky diffusivity calculation #2904

Closed
wants to merge 1 commit into from

Conversation

tomchor
Copy link
Collaborator

@tomchor tomchor commented Feb 6, 2023

This PR changes the calculation of the diffusivities in the SmagorinskyLilly so that binary operations are avoided. At the moment it does so by creating a calc_nonlinear_κᶜᶜᶜ() kernel, similar to what is done for the AMD closure.

Closes #2869

@glwagner
Copy link
Member

glwagner commented Feb 6, 2023

There's no need for a new kernel because the diffusivities are νₑ / Pr so they can be calculated on the fly.

We need to extend κᶠᶜᶜ and κᶜᶠᶜ and κᶜᶜᶠ

@inline κᶠᶜᶜ(i, j, k, grid, closure::SmagorinskyLilly, K, ::Val{id}, args...) = ℑxᶠᵃᵃ(i, j, k, grid, K.νₑ) / closure.Pr[id]
@inline κᶜᶠᶜ(i, j, k, grid, closure::SmagorinskyLilly, K, ::Val{id}, args...) = ℑyᵃᶠᵃ(i, j, k, grid, K.νₑ) / closure.Pr[id]
@inline κᶜᶜᶠ(i, j, k, grid, closure::SmagorinskyLilly, K, ::Val{id}, args...) = ℑzᵃᵃᶠ(i, j, k, grid, K.νₑ) / closure.Pr[id]

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 6, 2023

There's no need for a new kernel because the diffusivities are νₑ / Pr so they can be calculated on the fly.

We need to extend κᶠᶜᶜ and κᶜᶠᶜ and κᶜᶜᶠ

@inline κᶠᶜᶜ(i, j, k, grid, closure::SmagorinskyLilly, K, ::Val{id}, args...) = ℑxᶠᵃᵃ(i, j, k, grid, K.νₑ) / closure.Pr[id]
@inline κᶜᶠᶜ(i, j, k, grid, closure::SmagorinskyLilly, K, ::Val{id}, args...) = ℑyᵃᶠᵃ(i, j, k, grid, K.νₑ) / closure.Pr[id]
@inline κᶜᶜᶠ(i, j, k, grid, closure::SmagorinskyLilly, K, ::Val{id}, args...) = ℑzᵃᵃᶠ(i, j, k, grid, K.νₑ) / closure.Pr[id]

I may be missing something, but when I make those changes I still get the errors in #2869. The changes in this PR are the only ones I've tried that seem to solve the issue.

I've implemented and pushed your solution to the tc/smag-binary-op2 branch in case you wanna check if I understood your suggestion. Basically these are the changes:

@inline κᶠᶜᶜ(i, j, k, grid, closure::SmagorinskyLilly, K, ::Val{id}, args...) where id = ℑxᶠᵃᵃ(i, j, k, grid, K.νₑ) / closure.Pr[id]
@inline κᶜᶠᶜ(i, j, k, grid, closure::SmagorinskyLilly, K, ::Val{id}, args...) where id = ℑyᵃᶠᵃ(i, j, k, grid, K.νₑ) / closure.Pr[id]
@inline κᶜᶜᶠ(i, j, k, grid, closure::SmagorinskyLilly, K, ::Val{id}, args...) where id = ℑzᵃᵃᶠ(i, j, k, grid, K.νₑ) / closure.Pr[id]

I'm getting a device kernel image is invalid (code 200, ERROR_INVALID_IMAGE) error with only those changes.

@glwagner
Copy link
Member

glwagner commented Feb 6, 2023

On that branch you are still forming the BinaryOperation which is the problem. You also need to remove the BinaryOperation. If you open a PR we can work on it?

@glwagner
Copy link
Member

glwagner commented Feb 6, 2023

Change

# Use AbstractOperations to write eddy diffusivities in terms of
# eddy viscosity
κₑ_ops = []
for i = 1:length(tracer_names)
Pr = closure.Pr[i]
κₑ_op = νₑ / Pr
push!(κₑ_ops, κₑ_op)
end
κₑ = NamedTuple{tracer_names}(Tuple(κₑ_ops))
return (; νₑ, κₑ)

to

return (; νₑ)

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 6, 2023

On that branch you are still forming the BinaryOperation which is the problem. You also need to remove the BinaryOperation. If you open a PR we can work on it?

Sure, I'll open a new PR soon then!

But just to be clear, I totally get that the problem is that I'm still passing the binary operations. I kept them there because (if I understan correctly) if don't pass $\kappa_e$ in DiffusivityFields() then model.diffusivity_fields won't get populated, right? Or is your point that we don't have to populate model.diffusivity_fields for Smagorinsky?

@glwagner
Copy link
Member

glwagner commented Feb 6, 2023

The change I suggested will still put the eddy viscosity into diffusivity_fields, and removes the eddy diffusivities (calculating them on the fly from Pr and eddy viscosity)

@glwagner
Copy link
Member

glwagner commented Feb 6, 2023

You can also redefine the user-facing function diffusivity to return the original BinaryOperation (which hopefully will not enter into kernels because of the extension of those three “getter” methods)

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 6, 2023

The change I suggested will still put the eddy viscosity into diffusivity_fields, and removes the eddy diffusivities (calculating them on the fly from Pr and eddy viscosity)

Yes! Sorry I wasn't clear enough. I got that. I meant populate it with the diffuvisities specifically, so we're on the same page.

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 6, 2023

Sorry for going slow on this, but if I understand correctly what you're proposing is:

  • Remove the calculation of diffusivities from DiffusivityFields() (and thus remove tracer diffusivities from model.diffusivity_fields)
  • Specialize κᶠᶜᶜ() functions for SmagoriknskyLilly so that diffusivities are calculated on the fly

As opposed to what I'm doing here which is just to change the calculation of diffusivities in DiffusivityFields() from using a BinaryOperation to a kernel (keeping them in model.diffusivity_fields).

If I understand correctly both methods do the same number of operations (one calculation of $\nu_e$, a division by Pr and one interpolation for each face of each grid cell), no?

So I guess the advantage of what you're proposing is that it saves memory (since diffusivities are calculated on the fly), at the cost of a bit more code complexity (i.e., one more specialization).

Conversely, the direction this PR is going atm uses more memory (for the diffusivities) but in my opinion the code is a bit clearer, since there's one fewer specialization (i.e. κᶠᶜᶜ() remains the same) and the code in smagorinsky_lilly.jl looks more like the code in anisotropic_minimum_dissipation.jl, which makes things more standardized.

I'll defer to you either way, but I vote that we take the approach that this PR is currently doing since, as we discussed before in a few PRs, the code in TurbulenceClosures is already a bit on the complex side and not super easy to understand. So I think the standardization of having smagorinsky_lilly.jl have the same structure as anisotropic_minimum_dissipation.jl I'd argue is a benefit.

If we follow with this PR, the increase in memory should be around 15% for one tracer and less for more tracers, so relatively small, plus the memory limitation is about to become less severe since hopefully #2795 will be merged soon?

@glwagner like I said I'll defer to you either way. So if you prefer your original suggestion I'll close this PR and open another with your suggested modifications :)

@glwagner
Copy link
Member

glwagner commented Feb 6, 2023

I think extending κᶠᶜᶜ is simpler --- less code and less memory usage.

@glwagner
Copy link
Member

glwagner commented Feb 6, 2023

I also don't like calc_nonlinear_κᶜᶜᶜ. I hope we can come up with a better interface for defining LES closures in the future.

@glwagner
Copy link
Member

glwagner commented Feb 6, 2023

The memory savings is a major advantage of this closure over AnisotropicMinimumDissipation for problems with very large numbers of tracers (eg biogeochemistry problems with 10+ tracers).

Note also that there is overhead to launching a kernel which cannot be ignored --- we can't estimate computational cost just by adding the number of operations. Typically (though not always) our goal is to reduce the number of kernel launches as much as possible.

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 6, 2023

The memory savings is a major advantage of this closure over AnisotropicMinimumDissipation for problems with very large numbers of tracers (eg biogeochemistry problems with 10+ tracers).

Note also that there is overhead to launching a kernel which cannot be ignored --- we can't estimate computational cost just by adding the number of operations. Typically (though not always) our goal is to reduce the number of kernel launches as much as possible.

Good points. I also wasn't aware that the kernel launch time was an important issue. I'll close this PR and open another one extending κᶠᶜᶜ().

@tomchor tomchor closed this Feb 6, 2023
@tomchor tomchor deleted the tc/smag-binary-op branch February 6, 2023 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

device kernel image is invalid error when running relatively complex simulation on GPUs
2 participants