Skip to content

Try inlining matrix field multiply_matrix_at_index #2311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Apr 24, 2025

It occurred to me that if CUDA is not able to "always_inline" through our matrix field operations, any duplicate memory reads in multiply_matrix_at_index may actually cost us 2x what it should, as the compiler might not be able to hoist these reads if they are just function calls.

Ultimately, we should take a look at the ptx / llvm to see what's going on. At the very least, without inlining this method means that there will still be bounds checks.

Also, I've moved all of the dont_limit pieces to the end of ClimaCore.jl. IIRC, the dont_limit business must come after the last method definition for it to work properly.

I've also annotated a few getidx calls with the return type from getidx_return_type, to see if I can help out the compiler.

Try using prop inbounds in mat field
@charleskawczynski
Copy link
Member Author

charleskawczynski commented Apr 28, 2025

Ouuuuch

#=
git checkout ck/inline2
julia --check-bounds=yes --project
=#

import ClimaCore
include(
	joinpath(
		pkgdir(ClimaCore),
		"test/MatrixFields/matrix_fields_broadcasting/test_scalar_utils.jl"
	)
)

using SnoopCompileCore
tinf = @snoop_inference begin
	bc = @lazy @. (2 * ᶠᶜmat * ᶜᶜmat * ᶜᶠmat + ᶠᶠmat * ᶠᶠmat / 3 - (4I,)) *
	         (ᶠᶜmat * ᶜᶜmat * ᶜᶠmat * 2 - (ᶠᶠmat / 3) * ᶠᶠmat + (4I,))
	result = materialize(bc)
end;

using SnoopCompile
fg = flamegraph(tinf)
using ProfileView
ProfileView.view(fg)

took a while (~1 hour) and resulted in:

image

I suppose the issue here is code gen? So, I'm not sure if the same tricks in #2284 will help us here. Need to think about this..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant