-
Notifications
You must be signed in to change notification settings - Fork 38
Zonal improvements without geometry lookup #956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
this will be quite useful later.
If we use spatial slices and bylayer together does it return a RasterStack? |
That's the plan! It's almost there but there are some indexing order issues. |
ah fixed it, was just overcomplicating it in my head! @time zonal(precip_cellarea_stack; of = all_countries, bylayer = true, spatialslices = true, skipmissing = false, threaded = false) do stack
return sum(skipmissing(stack))
end
Applying #107 to each geometry... 100%|██████████████████████████████████████████████████| Time: 0:00:00
0.755803 seconds (2.27 M allocations: 336.739 MiB, 5.28% gc time, 57.64% compilation time: 6% of which was recompilation)
┌ 12×258 RasterStack ┐
├────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
↓ Ti ,
→ Geometry Sampled{Int64} Base.OneTo(258) ForwardOrdered Regular Points
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── layers ┤
:precip eltype: Int64 dims: Ti, Geometry size: 12×258
:cellarea eltype: Float64 dims: Geometry size: 258
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
missingval: (precip = missing, cellarea = nothing)
extent: Extent(Ti = (1, 12), Geometry = (1, 258))
filename: (precip = nothing, cellarea = nothing)
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
Btw, make sure 'show progress ' false works. It is a pain in CI to deal with those prints. |
Yeah, I'm actually thinking maybe we should have that be a preferences.jl thing you can configure, but that's a later thing |
But there's a lot of recompilation time here (50%) and it's probably pretty slow in general, this could use a pass by you @rafaqz to clean up the code and get this fast. |
here's a test script I wrote btw, some of the things still don't work but this is the idea using Rasters, RasterDataSources, ArchGDAL
using NaturalEarth
import Proj
precip = cat((Raster(WorldClim{Climate}, :prec; month = i) for i in 1:12)...; dims = Ti)
all_countries = naturalearth("admin_0_countries", 10)
precip_cellarea_stack = RasterStack((; precip = precip, cellarea = cellarea(precip)))
zonal(precip_cellarea_stack[Ti(1)]; of = all_countries, bylayer = false, skipmissing = false, threaded = false) do stack
isempty(stack) || all(ismissing, stack.precip) && return missing
prec_net = sum(skipmissing(stack.precip .* stack.cellarea))
avg_precip = prec_net ./ sum(skipmissing(stack.cellarea))
return avg_precip
end
zonal(precip_cellarea_stack; of = all_countries, bylayer = false, spatialslices = true, skipmissing = false, threaded = false) do stack
isempty(stack) || all(ismissing, stack.precip) && return missing
prec_net = sum(skipmissing(stack.precip .* stack.cellarea))
avg_precip = prec_net ./ sum(skipmissing(stack.cellarea))
return avg_precip
end
@time zonal(precip_cellarea_stack; of = all_countries, bylayer = false, spatialslices = true, skipmissing = false, threaded = false) do stack
return (; precip = sum(skipmissing(stack.precip)), cellarea = sum(skipmissing(stack.cellarea)))
end
s1 = @time zonal(precip_cellarea_stack; of = all_countries, bylayer = false, spatialslices = true, skipmissing = false, threaded = true) do stack
return (; precip = sum(skipmissing(stack.precip)), cellarea = sum(skipmissing(stack.cellarea)))
end
s2 = @time zonal(precip_cellarea_stack; of = all_countries, bylayer = true, spatialslices = true, skipmissing = false, threaded = true) do stack
return sum(skipmissing(stack))
end |
Added a new test / example script to the top comment. But I'm worried runtimes will have dropped significantly with all this complexity. |
Some interesting benchmarks and timer runs: @be zonal($sum, $(precip); of = $all_countries, spatialslices = $false, progress = $false, threaded = $false) seconds=5
──────────────────────────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 7.95s / 60.1% 4.52GiB / 97.4%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────────────────────────
crop + mask 6.19k 4.17s 87.2% 673μs 4.40GiB 100.0% 744KiB
skipmissing + spatialsliceify + f 6.19k 613ms 12.8% 99.0μs 1.22MiB 0.0% 207B
build result 24 351μs 0.0% 14.6μs 9.75KiB 0.0% 416B
────────────────────────────────────────────────────────────────────────────────────────────── @be zonal($sum, $(precip); of = $all_countries, spatialslices = $true, progress = $false, threaded = $false) seconds=5
──────────────────────────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 12.3s / 39.2% 4.38GiB / 97.4%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────────────────────────
crop + mask 5.93k 4.05s 84.1% 682μs 4.21GiB 98.8% 744KiB
skipmissing + spatialsliceify + f 5.93k 760ms 15.8% 128μs 47.7MiB 1.1% 8.24KiB
build result 23 5.03ms 0.1% 219μs 4.19MiB 0.1% 186KiB
────────────────────────────────────────────────────────────────────────────────────────────── so it seems that spatialslices does add some (but really quite negligible) amount of time to the final answer. I'm not surprised that it allocates more during spatialsliceify though, since it is constructing an array (there is "more" output). So I'm satisfied enough with spatialslices. |
Yeah, larger allocations will add time. How does the base case compare to before this PR? |
Same timing, I checked |
|
||
function _determine_missingval(x, bylayer) | ||
raw_missingval = isnothing(missingval(x)) ? missing : missingval(x) | ||
if isfalse(bylayer) && x isa AbstractRasterStack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is all of this for? It seems type unstable
Is raw_missingval
a NamedTuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's why I am trying to extract it.
The missingval on stacks can be like (; layer1=missing, layer2=nothing, layer3=NaN)
which I can't necessarily use as the missing value - I need a single thing, whether that's missing or nan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats not really possible... Each layer can have its own missing value and it will depend on the type. It could be 0xFF
for UInt8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue at hand is that if bylayer = true, ok fine, I know that I can use each. But what do I use if bylayer = false? Missing is the only option barring user input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, is this the correct way to do it? If I do
julia> r1 = Raster(1:2, (X(1:2),),; missingval = 1)
┌ 2-element Raster{Int64, 1} ┐
├────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
↓ X Sampled{Int64} 1:2 ForwardOrdered Regular Points
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
missingval: 1
extent: Extent(X = (1, 2),)
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
1 1
2 2
then none of them are actually missing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to map over all values and missingvals and check if any match. There should be a helper function somewhere that does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yeah, I guess missing
is what to use?
spatialslices = false
: if true, runf
on the slices of the datacube inX, Y
. You can also pass a tuple of dimensions to specify which dims to slice over. If false, run zonal on the whole cube cropped to each geom. If spatialslices is not false, then a data cube will be returned, with aGeometry(1:ngeom)
axis.bylayer = true
: if true, runf
on each layer of the stack individually, then recombine results into aNamedTuple
. If false, then runf
on the whole stack.missingval = missingval(x)
: specify the missing value that is to be pushed into the resulting raster.Explanatory script