Skip to content

Zonal improvements without geometry lookup #956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

asinghvi17
Copy link
Collaborator

@asinghvi17 asinghvi17 commented Apr 27, 2025

  • spatialslices = false: if true, run f on the slices of the datacube in X, Y. You can also pass a tuple of dimensions to specify which dims to slice over. If false, run zonal on the whole cube cropped to each geom. If spatialslices is not false, then a data cube will be returned, with a Geometry(1:ngeom) axis.
  • bylayer = true: if true, run f on each layer of the stack individually, then recombine results into a NamedTuple. If false, then run f on the whole stack.
  • missingval = missingval(x): specify the missing value that is to be pushed into the resulting raster.

Explanatory script

using Rasters, RasterDataSources, ArchGDAL
using NaturalEarth
import Proj

precip = cat((Raster(WorldClim{Climate}, :prec; month = i) for i in 1:12)...; dims = Ti)
all_countries = naturalearth("admin_0_countries", 10)

precip_cellarea_stack = RasterStack((; precip = precip, cellarea = cellarea(precip)))

# First, we have standard zonal.  This returns what it always used to return.
function f(ras)
    sum(ras)
end

@time zonal(f, precip_cellarea_stack[Ti(1)]; of = all_countries, skipmissing = true, threaded = false)

# Then, we have zonal with `bylayer = false`.  This gives the zonal function the entire stack,
# which means that it can operate on all layers of the stack at once.  This allows you to e.g.
# compute statistics which require the values of multiple layers, simultaneously.
# TODO: for best use, this still needs a good implementation of skipmissing on stacks.
function f(stack)
    isempty(stack) || all(ismissing, stack.precip) && return missing
    prec_net = sum(skipmissing(stack.precip .* stack.cellarea))
    avg_precip = prec_net ./ sum(skipmissing(stack.cellarea))
    return avg_precip
end

@time zonal(f, precip_cellarea_stack[Ti(1)]; of = all_countries, bylayer = false, skipmissing = false, threaded = false)

# Next, we have `spatialslices = true`.  This allows you to pass a data cube (not just a 2d raster),
# and have zonal automatically slice the cube into 2D (X, Y) slices, such that your function only needs 
# to know how to operate on 2D slices.
zonal(f, precip_cellarea_stack; of = all_countries, bylayer = false, spatialslices = true, skipmissing = false, threaded = false)

# Finally, if you return a named tuple from your zonal function, zonal will automatically
# construct a RasterStack for you.
@time zonal(precip_cellarea_stack; of = all_countries, bylayer = false, spatialslices = true, skipmissing = false, threaded = false) do stack
    return (; precip = sum(skipmissing(stack.precip)), cellarea = sum(skipmissing(stack.cellarea)))
end

f1(stack) = (; precip = sum(skipmissing(stack.precip)), cellarea = sum(skipmissing(stack.cellarea)))
s1 = @time zonal(f1, precip_cellarea_stack; of = all_countries, bylayer = false, spatialslices = true, skipmissing = false, threaded = true)
s2 = @time zonal(sum  skipmissing, precip_cellarea_stack; of = all_countries, bylayer = true, spatialslices = true, skipmissing = false, threaded = true)

all(map(==, s1, s2)) # true - => bylayer is not changing the result

@rafaqz
Copy link
Owner

rafaqz commented Apr 27, 2025

If we use spatial slices and bylayer together does it return a RasterStack?

@asinghvi17
Copy link
Collaborator Author

That's the plan! It's almost there but there are some indexing order issues.

@asinghvi17
Copy link
Collaborator Author

ah fixed it, was just overcomplicating it in my head!

@time zonal(precip_cellarea_stack; of = all_countries, bylayer = true, spatialslices = true, skipmissing = false, threaded = false) do stack
           return sum(skipmissing(stack))
       end
Applying #107 to each geometry... 100%|██████████████████████████████████████████████████| Time: 0:00:00
  0.755803 seconds (2.27 M allocations: 336.739 MiB, 5.28% gc time, 57.64% compilation time: 6% of which was recompilation)
┌ 12×258 RasterStack ┐
├────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
   Ti      ,
   Geometry Sampled{Int64} Base.OneTo(258) ForwardOrdered Regular Points
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── layers ┤
  :precip   eltype: Int64 dims: Ti, Geometry size: 12×258
  :cellarea eltype: Float64 dims: Geometry size: 258
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
  missingval: (precip = missing, cellarea = nothing)
  extent: Extent(Ti = (1, 12), Geometry = (1, 258))
  filename: (precip = nothing, cellarea = nothing)
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

@lazarusA
Copy link
Collaborator

Btw, make sure 'show progress ' false works. It is a pain in CI to deal with those prints.

@asinghvi17
Copy link
Collaborator Author

Yeah, I'm actually thinking maybe we should have that be a preferences.jl thing you can configure, but that's a later thing

@asinghvi17
Copy link
Collaborator Author

But there's a lot of recompilation time here (50%) and it's probably pretty slow in general, this could use a pass by you @rafaqz to clean up the code and get this fast.

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Apr 27, 2025

here's a test script I wrote btw, some of the things still don't work but this is the idea

using Rasters, RasterDataSources, ArchGDAL
using NaturalEarth
import Proj

precip = cat((Raster(WorldClim{Climate}, :prec; month = i) for i in 1:12)...; dims = Ti)
all_countries = naturalearth("admin_0_countries", 10)

precip_cellarea_stack = RasterStack((; precip = precip, cellarea = cellarea(precip)))

zonal(precip_cellarea_stack[Ti(1)]; of = all_countries, bylayer = false, skipmissing = false, threaded = false) do stack
    isempty(stack) || all(ismissing, stack.precip) && return missing
    prec_net = sum(skipmissing(stack.precip .* stack.cellarea))
    avg_precip = prec_net ./ sum(skipmissing(stack.cellarea))
    return avg_precip
end

zonal(precip_cellarea_stack; of = all_countries, bylayer = false, spatialslices = true, skipmissing = false, threaded = false) do stack
    isempty(stack) || all(ismissing, stack.precip) && return missing
    prec_net = sum(skipmissing(stack.precip .* stack.cellarea))
    avg_precip = prec_net ./ sum(skipmissing(stack.cellarea))
    return avg_precip
end


@time zonal(precip_cellarea_stack; of = all_countries, bylayer = false, spatialslices = true, skipmissing = false, threaded = false) do stack
    return (; precip = sum(skipmissing(stack.precip)), cellarea = sum(skipmissing(stack.cellarea)))
end


s1 = @time zonal(precip_cellarea_stack; of = all_countries, bylayer = false, spatialslices = true, skipmissing = false, threaded = true) do stack
    return (; precip = sum(skipmissing(stack.precip)), cellarea = sum(skipmissing(stack.cellarea)))
end

s2 = @time zonal(precip_cellarea_stack; of = all_countries, bylayer = true, spatialslices = true, skipmissing = false, threaded = true) do stack
    return sum(skipmissing(stack))
end

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Apr 27, 2025

Added a new test / example script to the top comment. But I'm worried runtimes will have dropped significantly with all this complexity.

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Apr 28, 2025

Some interesting benchmarks and timer runs:

@be zonal($sum, $(precip); of = $all_countries, spatialslices = $false, progress = $false, threaded = $false) seconds=5
──────────────────────────────────────────────────────────────────────────────────────────────
                                                     Time                    Allocations      
                                            ───────────────────────   ────────────────────────
             Tot / % measured:                   7.95s /  60.1%           4.52GiB /  97.4%    

Section                             ncalls     time    %tot     avg     alloc    %tot      avg
──────────────────────────────────────────────────────────────────────────────────────────────
crop + mask                          6.19k    4.17s   87.2%   673μs   4.40GiB  100.0%   744KiB
skipmissing + spatialsliceify + f    6.19k    613ms   12.8%  99.0μs   1.22MiB    0.0%     207B
build result                            24    351μs    0.0%  14.6μs   9.75KiB    0.0%     416B
──────────────────────────────────────────────────────────────────────────────────────────────
@be zonal($sum, $(precip); of = $all_countries, spatialslices = $true, progress = $false, threaded = $false) seconds=5
──────────────────────────────────────────────────────────────────────────────────────────────
                                                     Time                    Allocations      
                                            ───────────────────────   ────────────────────────
             Tot / % measured:                   12.3s /  39.2%           4.38GiB /  97.4%    

Section                             ncalls     time    %tot     avg     alloc    %tot      avg
──────────────────────────────────────────────────────────────────────────────────────────────
crop + mask                          5.93k    4.05s   84.1%   682μs   4.21GiB   98.8%   744KiB
skipmissing + spatialsliceify + f    5.93k    760ms   15.8%   128μs   47.7MiB    1.1%  8.24KiB
build result                            23   5.03ms    0.1%   219μs   4.19MiB    0.1%   186KiB
──────────────────────────────────────────────────────────────────────────────────────────────

so it seems that spatialslices does add some (but really quite negligible) amount of time to the final answer. I'm not surprised that it allocates more during spatialsliceify though, since it is constructing an array (there is "more" output).

So I'm satisfied enough with spatialslices.

@rafaqz
Copy link
Owner

rafaqz commented Apr 28, 2025

Yeah, larger allocations will add time. How does the base case compare to before this PR?

@asinghvi17
Copy link
Collaborator Author

Same timing, I checked


function _determine_missingval(x, bylayer)
raw_missingval = isnothing(missingval(x)) ? missing : missingval(x)
if isfalse(bylayer) && x isa AbstractRasterStack
Copy link
Owner

@rafaqz rafaqz Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is all of this for? It seems type unstable

Is raw_missingval a NamedTuple?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's why I am trying to extract it.

The missingval on stacks can be like (; layer1=missing, layer2=nothing, layer3=NaN)

which I can't necessarily use as the missing value - I need a single thing, whether that's missing or nan.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats not really possible... Each layer can have its own missing value and it will depend on the type. It could be 0xFF for UInt8.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue at hand is that if bylayer = true, ok fine, I know that I can use each. But what do I use if bylayer = false? Missing is the only option barring user input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, is this the correct way to do it? If I do

julia> r1 = Raster(1:2, (X(1:2),),; missingval = 1)
┌ 2-element Raster{Int64, 1} ┐
├────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
   X Sampled{Int64} 1:2 ForwardOrdered Regular Points
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
  missingval: 1
  extent: Extent(X = (1, 2),)
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
 1  1
 2  2

then none of them are actually missing...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to map over all values and missingvals and check if any match. There should be a helper function somewhere that does this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah, I guess missing is what to use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants