-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add resample distributions function for probability matching #390
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #390 +/- ##
==========================================
+ Coverage 83.67% 83.81% +0.13%
==========================================
Files 159 160 +1
Lines 12649 12783 +134
==========================================
+ Hits 10584 10714 +130
- Misses 2065 2069 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good! do we have any example result? is this working?
Apologies for the ugly plots, but this was the quick and dirty way to test three versions: This is a blended forecast up to 12 hours ahead (1 member) for 16 July 2024 in the Netherlands. The left column shows a forecast without the resampling and also no smoothing as introduced by @sidekock yesterday. The middle column does both the smoothing and resampling prior to the probability matching and the right column is a check with only smoothing and no resampling. Overall, I would say that the smoothing already makes a differences on the edges of the radar domain (@sidekock gave a better example in his PR), but the resampling clearly results in less suppression of the extremes in the NWP (you kind of see a dissapation for the longer lead times in the left and right columns, but not in the middle one). I think this is what we are looking for, although one example is of course a bit limited. |
I think the PR is ready to go, but definitely feel free to give it a last thorough review. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some changes directly to the code and left few comments that I think it'd be better to clarify before we merge
Looks good like this to me, @dnerini, thanks for the help! |
@ladc @RubenImhoff @sidekock
|
Isn't that what we were looking for? I think the problem in left example (60, 90 minutes) is that the mismatching precipitation bring lower precipitation values than the actual distributions either from NWC or NWP. And these resampling guarantees a similar distribution. In the right example, the values look similar in all the lead-times whereas in the left it is clear that the smoothing (lost of high values) is happening in the middle lead-times where I am assuming the weight it is closer to 0,5. Just let me know what you think 😄 |
totally agree with you, just wanted to make it clear that this has a significant impact on the output. |
Thanks for sharing, @dnerini! Based on the input radar data (see below), I'd agree that the changes seem to improve the distribution. Nevertheless, would be nice to see some more examples at some point (I'm sure some will come up from both KNMI and RMI soon). |
Yes, I agree that this is actually better since you don't lose the extremes as much. Thanks for the examples. It might also fix a problem I had with the control member but I still need to run some tests, sorry for the delay. |
Hi @ladc, Something else that could play a role is the following in the
We fill up the 'outside radar domain' extrapolation values with the NWP values to have no nans (which I initially thought was a good idea). But maybe that skews the distribution in the wrong direction? So maybe we should change that back to actually allowing for nans in the extrapolation component? |
@ladc, would the above be worth the test for your case? |
Thanks @RubenImhoff. I didn't turn on the smoothing, this was on purpose. |
I think just the resampling for now. This would at least keep the radar distribution lower in a case like this (because you push the distribution up by filling the nans with the values of the NWP forecast outside the radar domain). |
Hi! Sorry to jump late for this discussion. I have a small question; how is it looking before doing the resampling/cdf? Because I am not sure if the problem is coming from the resample or from the merging of the cascades with different weights in different areas? @ladc , I did run your example data; maybe you could update or send me the snippet for recreating the figure so I can have a more detail look. :) |
Hi @aitaten you can find the plot_nwc.py and plot_radar.py script here - apologies for the messy code: |
I tried not filling the nan values (or rather not including them in the pmatching) - but I was a bit rushed so it's possible that the code contains errors: Warning: I will be fully offline in the coming week and still on vacation until 14 August. The code is in my branch mentioned in #370 |
For me it is not working either ... I had to change a couple of things in the merged code (#370 ) because |
@ladc, your second attempt looks already better to me! Should I change the code (back) to accepting nans outside the radar domain? Then we're good to go, I hope. |
And seeing the vacation of @ladc, what do the others think? |
one remaining issue might be that using the weights from level 2 gives too much credit to NWP already from the first lead time, while you would expect a more progressive transition from the radar distribution... |
@dnerini, any idea on how to nicely fix this? Right now the nans outside the radar domain are filled with the NWP values, which changes the distribution of the extrapolation cascade data. We could go back to our earlier version where nans were allowed, or maybe we can think of a more elegant solution to handle the nans outside the radar domain without changing the distribution? |
I wonder if we could simply "blend in" the NWP values outside the radar domain, also improving the seamlessness at t0. In practice we could try replacing the radar mask with zeros and see if that is enough. Alternatively, one could treat the part outside the radar domain separately and use a simple blending function to introduce the NWP values. Both approaches should avoid the jump at t0 while the resampling approach can stay the same. What do you think? |
That would be worth the try. Would we introduce any statistical problems in the resampling if we introduce zeros (or |
First attempt to add a resample distributions function for probability matching, see also issue #353.
postprocessing/probmatching.py
.R_pm_blended
variable inblending/steps.py
.probmatching_method
iscdf
ormean
inblending/steps.py
, add resampling prior to using the probability matching as an optional functionality.