Try inlining matrix field `multiply_matrix_at_index` #2311

charleskawczynski · 2025-04-24T20:44:08Z

It occurred to me that if CUDA is not able to "always_inline" through our matrix field operations, any duplicate memory reads in multiply_matrix_at_index may actually cost us 2x what it should, as the compiler might not be able to hoist these reads if they are just function calls.

Ultimately, we should take a look at the ptx / llvm to see what's going on. At the very least, without inlining this method means that there will still be bounds checks.

Also, I've moved all of the dont_limit pieces to the end of ClimaCore.jl. IIRC, the dont_limit business must come after the last method definition for it to work properly.

I've also annotated a few getidx calls with the return type from getidx_return_type, to see if I can help out the compiler.

Try using prop inbounds in mat field

charleskawczynski · 2025-04-28T17:47:39Z

Ouuuuch

#=
git checkout ck/inline2
julia --check-bounds=yes --project
=#

import ClimaCore
include(
	joinpath(
		pkgdir(ClimaCore),
		"test/MatrixFields/matrix_fields_broadcasting/test_scalar_utils.jl"
	)
)

using SnoopCompileCore
tinf = @snoop_inference begin
	bc = @lazy @. (2 * ᶠᶜmat * ᶜᶜmat * ᶜᶠmat + ᶠᶠmat * ᶠᶠmat / 3 - (4I,)) *
	         (ᶠᶜmat * ᶜᶜmat * ᶜᶠmat * 2 - (ᶠᶠmat / 3) * ᶠᶠmat + (4I,))
	result = materialize(bc)
end;

using SnoopCompile
fg = flamegraph(tinf)
using ProfileView
ProfileView.view(fg)

took a while (~1 hour) and resulted in:

I suppose the issue here is code gen? So, I'm not sure if the same tricks in #2284 will help us here. Need to think about this..

charleskawczynski added the performance label Apr 24, 2025

charleskawczynski force-pushed the ck/inline2 branch 2 times, most recently from 8a08332 to ebc079c Compare April 24, 2025 20:55

Move all dont-inline to end of ClimaCore

273fab6

Try using prop inbounds in mat field

charleskawczynski force-pushed the ck/inline2 branch from ebc079c to 273fab6 Compare April 24, 2025 21:00

Increase mem requests

65a1d93

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try inlining matrix field `multiply_matrix_at_index` #2311

Try inlining matrix field `multiply_matrix_at_index` #2311

charleskawczynski commented Apr 24, 2025 •

edited

Loading

charleskawczynski commented Apr 28, 2025 •

edited

Loading

Try inlining matrix field multiply_matrix_at_index #2311

Are you sure you want to change the base?

Try inlining matrix field multiply_matrix_at_index #2311

Conversation

charleskawczynski commented Apr 24, 2025 • edited Loading

charleskawczynski commented Apr 28, 2025 • edited Loading

Try inlining matrix field `multiply_matrix_at_index` #2311

Try inlining matrix field `multiply_matrix_at_index` #2311

charleskawczynski commented Apr 24, 2025 •

edited

Loading

charleskawczynski commented Apr 28, 2025 •

edited

Loading