-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for monthly-resolution predictions #172
Conversation
The key here is that So, with datetime objects, this line: Will try to subtract 1 month but instead subtracts 1 nanosecond. So by forcing the following: We actually subtract off the correct amount of time. [M] for month, [Y] for year. This line then has to be changed to accomodate for adding the appropriate time as well: The "real time" approach is great for this. However, I suggest then that we require all objects to have We could add a |
Note -- I plan to just implement monthly resolution with this PR. I think daily would actually be easier than seasonal here. But I want to get it working for monthly, then we can adapt for daily/seasonal? |
@aaronspring, can you test |
Another issue is that DPLE monthly has a first lead time of zero, truly. Since initialization occurs November 1st, the first lead is the month of November. I.e., a lead of zero in the eyes of this package. This works fine for
|
Regarding the initial comment. When computing monthly skill, what kind of inputs do we require? I naively thought of those: Hindcast: lead int, init int or datetime While it might be more tricky to implement, i think allowing ints is rather easy for the decidable prediction user to provide. When the user should provide datetime s/he has to think about whether to put the November init date there and so on, even if it’s only the decidable prediction thing. @bradyrx to do you think it’s useful to have ints and datetime? Or rather require one in the calculation but before a conversion if ints are involved? |
Can you please show the annual skill in your example also? Thanks |
Regarding time format. My perfect model runs are in the year 3000. That is changeable but some control runs might be 1000 years. For that purpose I suggest we use cftime. It has all the groupby features as datetime. |
|
||
Returns: | ||
skill (xarray object): Predictability with main dimension `lag`. | ||
""" | ||
nlags = hind.lead.size | ||
if resolution not in ['Y', 'M']: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe NotImplementedError
|
||
Returns: | ||
skill (xarray object): Predictability with main dimension `lag`. | ||
""" | ||
nlags = hind.lead.size | ||
if resolution not in ['Y', 'M']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe when constants.py is added in my PR, add TEMPORAL_RES = ['Y', 'M']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add this if ...
to checks.py since I see it used in multiple places
monthly data added: https://github.com/bradyrx/climpred-data control_mm
<xarray.Dataset>
Dimensions: (area: 2, time: 3612)
Coordinates:
* area (area) object 'global' 'North_Atlantic'
* time (time) object 3000-01-31 00:00:00 ... 3300-12-31 00:00:00
Data variables:
tos (area, time) float32 ...
sos (area, time) float32 ...
ds_mm
<xarray.Dataset>
Dimensions: (area: 2, init: 12, lead: 252, member: 10)
Coordinates:
* area (area) object 'global' 'North_Atlantic'
* lead (lead) int64 1 2 3 4 5 6 7 8 9 ... 245 246 247 248 249 250 251 252
* init (init) object 3014-01-31 00:00:00 ... 3257-01-31 00:00:00
* member (member) int64 0 1 2 3 4 5 6 7 8 9 now I realize, init should maybe be one month earlier. well thats what the discussion is about. but this at least is data to test with. probably we will have to define more clearly how the data needs to be formatted. |
Will get back to the other comments, but on this:
I agree. I think we can just force all xarray objects to have |
It'd be good to test this with NMME as well once ready.
…On Sun, Jun 9, 2019, 6:52 PM Riley Brady ***@***.***> wrote:
Will get back to the other comments, but on this:
Regarding time format. My perfect model runs are in the year 3000. That is
changeable but some control runs might be 1000 years. For that purpose I
suggest we use cftime. It has all the groupby features as datetime.
I agree. I think we can just force all xarray objects to have cftime to
only have to anticipate one type of datetime. I'll look into this module.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/bradyrx/climpred/pull/172?email_source=notifications&email_token=ADU7FFSYGJX3QNVQU3GDP5DPZWXV7A5CNFSM4HVZ3RIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXIXLKA#issuecomment-500266408>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADU7FFTGUKL7LJZX5K4XR6DPZWXV7ANCNFSM4HVZ3RIA>
.
|
I am glad to have If initialization took place at the turn of the year jan 1st xxxx-01-01 00:00:00, then lets use this as init and mid-jan xxxx-01-15 00:00:00 matching the first lead? Or rather end of the month, because output is usually rather formatted as this? Better if we can somehow make it independent of timescales lower than a month when calculating monthly skill, so we can worry less. Its difficult matching the best setup for us internally but also consistent and useful for the user. |
using cftime in xarraySo I loaded in FOSI (spans monthly from 1948-2015) and can add a cftime object by the following: You could also imagine setting up initializations on every November 1st: cftime for
|
I agree with all of this. Inits should be exactly when initialization happened. if we put I dont care whether we choose end of the month or the mid of the month as timestep for reference and then internally to compare. probably 2015-11-15 is easier because its static. (month end might be tricky for the leap years) So in principle I am in favor of your suggestion:
Are there functions like |
I have to focus on writing a manuscript over the next couple weeks. I've been putting it off. I won't have time to work on this til then. So if either of you have ideas, feel free to push to this branch and play around with |
Woops, missed your message @aaronspring, but pandas is amazing for timeseries (and already a requirement of xarray)
Unfortunately, it doesn't support cftime. |
) | ||
# Some lags force this into a numpy array for some reason. | ||
imax = xr.DataArray(imax).rename('time') | ||
forecast = forecast.where(forecast.time <= imax, drop=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you could do:
forecast.sel(time=slice(imin, imax))
Great point @ahuang11. I've played with this in the past and those addoffset functions are great. |
I think we should force everything to use cftime. Some decorator check to see if integers are provided; if so, convert to cftime. Once we use cftime, the temporal resolution (daily/monthly/yearly) should become irrelevant. |
Closing for now since we will address this as a primary goal of v2. |
when going for monthly I developed a function that generates uninitialized ensembles from a control, based on a start month to be specified. to be used in my perfect-model framework. def bootstrap_uninit_pm_ensemble_from_control_mm(ds, control, init_month):
"""
Create a pseudo-ensemble from control run.
Note:
Needed for block bootstrapping confidence intervals of a metric in perfect
model framework. Takes randomly segments of length of ensemble dataset from
control and rearranges them into ensemble and member dimensions.
Args:
ds (xarray object): ensemble simulation.
control (xarray object): control simulation.
Returns:
ds_e (xarray object): pseudo-ensemble generated from control run.
"""
nens = ds.init.size
nmember = ds.member.size
length = ds.lead.size
c_start = control.time.dt.year[0]
c_end = control.time.dt.year[-1]
lead_time = ds['lead']
def sel_years(control, year_s, length):
new = control.sel(time=slice(str(year_s)+'-'+str(init_month).zfill(2),
str(year_s + length//12 + 1)+'-'+str(init_month).zfill(2)))
new = new.rename({'time': 'lead'}).isel(lead=slice(0, lead_time.size))
new['lead'] = lead_time
return new
def create_pseudo_members(control):
startlist = np.random.randint(c_start, c_end - length - 1, nmember)
return xr.concat(
(sel_years(control, start, length)
for start in startlist), 'member'
) |
Description
This PR adds support for prediction ensembles at monthly resolution.
Fixes https://github.com/bradyrx/climpred/issues/44
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Checklist (while developing)
pytest
, if necessary.Pre-Merge Checklist (final steps)
treon
.pytest
runs without breaking.Todo
classes
compute_hindcast
datetime
module?