Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time limit in process_custom_climate_data #157

Closed
anoukvlug opened this issue Jan 25, 2017 · 7 comments
Closed

Time limit in process_custom_climate_data #157

anoukvlug opened this issue Jan 25, 2017 · 7 comments

Comments

@anoukvlug
Copy link
Collaborator

When my climate file starts in 1600 (or earlier) the years are not recognized as years any more.

yrs = nc_ts.time.year
AttributeError: 'Index' object has no attribute 'year'

I'm planning to make some changes in my fork, to solve this problem.
Would it make sense to make a pull request of these chances later on?

@fmaussion
Copy link
Member

Hi Anouk,

This is getting quite interesting, I'd like to know more about your plans ;-) But first, to your question:

I expect that solving this problem won't be trivial, as there is an underlying problem in pandas and xarray (see e.g. pydata/xarray#789 for xarray, which relies on pandas: pandas-dev/pandas#7307 ).

OGGM does not rely heavily on pandas datetimes so I think we could solve this problem without getting burned in a hell of changes, but on the top of my head I can't really make a prognostic about how much change this would imply.

In fact, the dynamical model of OGGM already relies on a very simple time system (https://github.com/OGGM/oggm/blob/master/oggm/utils.py#L675-L718), which could be meliorated and extended (based on other peoples systems, as I hope we don't have to reinvent the wheel here).

I know this is not very helpful, but yes a change that would make OGGM work on longer timescales is necessary (I just didn't have that in mind when I started :-( )

@anoukvlug
Copy link
Collaborator Author

Hello Fabien,
Thank you for your quick response. My first idea is to change

yrs = nc_ts.time.year
y0, y1 = yrs[0], yrs[-1]

into

time1 = nc_ts.variables['time']
time1 = netCDF4.num2date(time1[:], time1.units)
y0=int(time1[0].strftime('%Y'))
y1=int(time1[-1].strftime('%Y'))

to avoid using pandas.

I don't know yet what to do with these lines:
nc_ts.set_period(t0='{}-10-01'.format(y0), t1='{}-09-01'.format(y1))
gdir.write_monthly_climate_file(time, iprcp, itemp, igrad, ihgt, ref_pix_lon, ref_pix_lat)

So far these are the only lines that has give me problems with the starting date. So hopefully this requires just a small change. I will let you know once I know more about what I what to do with those lines.

Shortly summarised I would like to run OGGM with over the last ~1000 years with different climate simulations of the Community Earth System Model Last Millennium Ensemble (CESM-LME), to look at the influence of natural climate variability on glacier in the Canadian Arctic. Later on in my PhD I would like to use OGGM to make runs over the second half of the Holocene.

@anoukvlug
Copy link
Collaborator Author

gdir.write_monthly_climate_file(time, iprcp, itemp, igrad, ihgt, ref_pix_lon, ref_pix_lat)

Doesn't give a problem with the proper input file. I thought you could set the date of 'days since' yourself, but if you use just 'days since 1801-01-01' than it all works fine.

@fmaussion
Copy link
Member

Shortly summarised I would like to run OGGM with over the last ~1000 years with different climate simulations of the Community Earth System Model Last Millennium Ensemble (CESM-LME)

Right, I see. So there are good news and bad news about that ;-)

  • the good news is, you don't have to change anything in the existing code. Indeed, OGGM will follow Marzeion et al (2012) in how the GCM runs will be done: the calibration will still happen with observational data (CRU, or whatever is best), and the runs will be done with "downscaled" GCM data (in a first step, using the delta method). This is important because some pre-processing steps (like the bed inversion) require realistic climate data first
  • the bad news is, there is no mechanism to apply the delta method to GCM data in OGGM yet. I expect this to be quite easy to implement though, and this would be a very good improvement to the codebase

@anoukvlug
Copy link
Collaborator Author

Here follows an update on the conversation we had on Slack:

I made a 'process_gcm_data' task in climate.py, which is based on a copy of the 'process_cru_data' task.

` @entity_task(log, writes=['gcm_data'])
def process_gcm_data(gdir):

# read the climatology high resolution CL2 climatologies
clfile = utils.get_cru_cl_file()
ncclim = salem.GeoNetcdf(clfile)

# GCM temperature and precipitation data
if not (('gcm_temp_file' in cfg.PATHS) and \
                os.path.exists(cfg.PATHS['gcm_temp_file'])):
    raise IOError('GCM temp file not found')

if not (('gcm_precc_file' in cfg.PATHS) and \
                os.path.exists(cfg.PATHS['gcm_precc_file'])):
    raise IOError('GCM precc file not found')

if not (('gcm_precl_file' in cfg.PATHS) and \
                os.path.exists(cfg.PATHS['gcm_precl_file'])):
    raise IOError('GCM precl file not found')


# read the file
fpathTemp = cfg.PATHS['gcm_temp_file']
fpathPrecc = cfg.PATHS['gcm_precc_file']
fpathPrecl = cfg.PATHS['gcm_precl_file']
temp = salem.open_xr_dataset(fpathTemp)
temp = temp.TREFHT
temp.time 
temp = temp[9:-3, :, :] # from normal years to hydrological years
precpC = xr.open_dataset(fpathPrecc)
precpL = xr.open_dataset(fpathPrecl)
precp = precpC.PRECC+precpL.PRECL
precp = precp[9:-3, :, :] # from normal years to hydrological years

precp.time 
temp['time'] = pd.period_range('850-10', '2005-9', freq='M') 
precp['time'] = pd.period_range('850-10', '2005-9', freq='M')

# Use the year and month for selection
year = np.array([t.year for t in temp.time.values])  
month = np.array([t.month for t in temp.time.values]) 

year = np.array([t.year for t in precp.time.values]) 
month = np.array([t.month for t in precp.time.values])  
time= temp.time

y0=year[0]
y1=year[-1]

ny, r = divmod(len(temp.time), 12)
assert r == 0

# gradient default params
use_grad = cfg.PARAMS['temp_use_local_gradient']
def_grad = cfg.PARAMS['temp_default_gradient']
g_minmax = cfg.PARAMS['temp_local_gradient_bounds']

lon = gdir.cenlon
lat = gdir.cenlat

# This is guaranteed to work because I prepared the file (I hope)
ncclim.set_subset(corners=((lon, lat), (lon, lat)), margin=1)

# get climatology data
loc_hgt = ncclim.get_vardata('elev')
loc_tmp = ncclim.get_vardata('temp')
loc_pre = ncclim.get_vardata('prcp')
loc_lon = ncclim.get_vardata('lon')
loc_lat = ncclim.get_vardata('lat')

# see if the center is ok
if not np.isfinite(loc_hgt[1, 1]):
    # take another candidate where finite
    isok = np.isfinite(loc_hgt)

    # wait: some areas are entirely NaNs, make the subset larger
    _margin = 1
    while not np.any(isok):
        _margin += 1
        ncclim.set_subset(corners=((lon, lat), (lon, lat)), margin=_margin)
        loc_hgt = ncclim.get_vardata('elev')
        isok = np.isfinite(loc_hgt)
    if _margin > 1:
        log.debug('%s: I had to look up for far climate pixels: %s',
                  gdir.rgi_id, _margin)

    # Take the first candidate (doesn't matter which)
    lon, lat = ncclim.grid.ll_coordinates
    lon = lon[isok][0]
    lat = lat[isok][0]
    # Resubset
    ncclim.set_subset()
    ncclim.set_subset(corners=((lon, lat), (lon, lat)), margin=1)
    loc_hgt = ncclim.get_vardata('elev')
    loc_tmp = ncclim.get_vardata('temp')
    loc_pre = ncclim.get_vardata('prcp')
    loc_lon = ncclim.get_vardata('lon')
    loc_lat = ncclim.get_vardata('lat')

assert np.isfinite(loc_hgt[1, 1])
isok = np.isfinite(loc_hgt)
hgt_f = loc_hgt[isok].flatten()
assert len(hgt_f) > 0.
ts_grad = np.zeros(12) + def_grad
if use_grad and len(hgt_f) >= 5:
    for i in range(12):
        loc_tmp_mth = loc_tmp[i, ...][isok].flatten()
        slope, _, _, p_val, _ = stats.linregress(hgt_f, loc_tmp_mth)
        ts_grad[i] = slope if (p_val < 0.01) else def_grad
# ... but dont exaggerate too much
ts_grad = np.clip(ts_grad, g_minmax[0], g_minmax[1])
# convert to timeserie and hydroyears
ts_grad = ts_grad.tolist()
ts_grad = ts_grad[9:] + ts_grad[0:9]
ts_grad = np.asarray(ts_grad * ny)

latdif=abs((temp.lat)-lat)
latmin=min(latdif)
latsel=np.where(latdif==latmin)
latsel=np.asarray(latsel)
latsel=latsel.squeeze()
temp = temp.isel(lat=((latsel-1), latsel, (latsel+1))) 
precp = precp.isel(lat=((latsel-1), latsel, (latsel+1)))

longi = np.hstack((temp.lon[:73], (temp.lon[73:]-360)))
temp['lon'] = longi
precp['lon'] = longi
londif = abs(temp.lon - lon)
lonmin=min(londif)
lonsel=np.where(londif==lonmin)
lonsel=np.asarray(lonsel)
lonsel=lonsel.squeeze()
temp=temp.isel(lon=((lonsel-1), lonsel, (lonsel+1))) 
precp = precp.isel(lon=((lonsel-1), lonsel, (lonsel+1)))

Ndays = np.tile([31, 30, 31, 31, 28, 31, 30, 31, 30, 31, 31, 30], (y1 - y0))
for i in range(0,3):
    for j in range(0,3):
        precp[: ,i, j] = precp[: ,i, j]* (60 * 60 * 24 * 1000) * Ndays # from [m/s] to [kg/m^2 per month]

temp.salem.grid
precp.salem.grid

# compute monthly anomalies
# of temp
tmp = temp
ts_tmp_avg = tmp.isel(time= (year >= 1961) & (year <= 1990))  #time line makes time based selection from the netcdf file.
ts_tmp_avg=ts_tmp_avg.groupby('time.month').mean(dim='time')
ts_tmp = temp.groupby('time.month')-ts_tmp_avg

# of precip
ts_pre_avg = precp.isel(time= (year >= 1961) & (year <= 1990))  #time line makes time based selection from the netcdf file.
ts_pre_avg= ts_pre_avg.groupby('time.month').mean(dim='time')
ts_pre = precp.groupby('time.month')-ts_pre_avg

# interpolate to HR grid
if np.any(~np.isfinite(ts_tmp[:, 1, 1])):
    # Extreme case, middle pix is not valid
    # take any valid pix from the 3*3 (and hope there's one)
    found_it = False
    for idi in range(2):
        for idj in range(2):
            if np.all(np.isfinite(ts_tmp[:, idj, idi])):
                ts_tmp[:, 1, 1] = ts_tmp[:, idj, idi]
                ts_pre[:, 1, 1] = ts_pre[:, idj, idi]
                found_it = True
    if not found_it:
        msg = '{}: OMG there is no climate data'.format(gdir.rgi_id)
        raise RuntimeError(msg)
elif np.any(~np.isfinite(ts_tmp)):
    # maybe the side is nan, but we can do nearest
    ts_tmp = ncclim.grid.map_gridded_data(ts_tmp.values, temp.salem.grid,
                                          interp='nearest')
    ts_pre = ncclim.grid.map_gridded_data(ts_pre.values, precp.salem.grid,
                                          interp='nearest')
else:
    # We can do bilinear
    ts_tmp = ncclim.grid.map_gridded_data(ts_tmp.values, temp.salem.grid, interp='linear')

    ts_pre = ncclim.grid.map_gridded_data(ts_pre.values, precp.salem.grid,
                                          interp='nearest')

# take the center pixel and add it to the CRU CL clim
# for temp
loc_tmp = xr.DataArray(loc_tmp[:, 1, 1], dims=['month'],
                       coords={'month':ts_tmp_avg.month})
ts_tmp = xr.DataArray(ts_tmp[:, 1, 1], dims=['time'],
                       coords={'time':time})
ts_tmp = ts_tmp.groupby('time.month') + loc_tmp
# for prcp
loc_pre = xr.DataArray(loc_pre[:, 1, 1], dims=['month'],
                       coords={'month':ts_pre_avg.month})
ts_pre = xr.DataArray(ts_pre[:, 1, 1], dims=['time'],
                       coords={'time':time})
ts_pre = ts_pre.groupby('time.month') + loc_pre

# load dates in right format to save
Tindex=salem.GeoNetcdf(fpathTemp, monthbegin=True)
time1 = Tindex.variables['time']
time2 = time1[9:-3] - Ndays # from normal years to hydrological years
print('time1.units', time1.units)
time2 = netCDF4.num2date(time2, time1.units, calendar='noleap')

# done
loc_hgt = loc_hgt[1, 1]
loc_lon = loc_lon[1]
loc_lat = loc_lat[1]
assert np.isfinite(loc_hgt)
assert np.all(np.isfinite(ts_pre.values))
assert np.all(np.isfinite(ts_tmp.values))
assert np.all(np.isfinite(ts_grad))
gdir.write_gcm_data_file(time2, ts_pre.values, ts_tmp.values,
                                ts_grad, loc_hgt, loc_lon, loc_lat)

ncclim._nc.close()
Tindex._nc.close()

# metadata
out = {'climate_source': 'GCM data', 'hydro_yr_0': y0 + 1, 'hydro_yr_1': y1}
gdir.write_pickle(out, 'climate_info')`

It is still really unflexible, but I thought it is a good moment to share it anyway. To test if it was working I let is overwrite the monthly_climate.nc file, instead of creating a gcm_data.nc file. With the function:

def write_gcm_data_file(self, time, prcp, temp, grad, ref_pix_hgt,
                               ref_pix_lon, ref_pix_lat):

    # overwrite as default
    # fpath = self.get_filepath('gcm_data')
    fpath = self.get_filepath('climate_monthly')
    if os.path.exists(fpath):
        os.remove(fpath)

    with netCDF4.Dataset(fpath, 'w', format='NETCDF4') as nc:
        nc.ref_hgt = ref_pix_hgt
        nc.ref_pix_lon = ref_pix_lon
        nc.ref_pix_lat = ref_pix_lat
        nc.ref_pix_dis = haversine(self.cenlon, self.cenlat,
                                   ref_pix_lon, ref_pix_lat)

        dtime = nc.createDimension('time', None)

        nc.author = 'OGGM'
        nc.author_info = 'Open Global Glacier Model'

        timev = nc.createVariable('time','i4',('time',))
        timev.setncatts({'units':'days since 0850-01-01 00:00:00'})
        timev[:] = netCDF4.date2num([t for t in time],
                                    'days since 0850-01-01 00:00:00')

        v = nc.createVariable('prcp', 'f4', ('time',), zlib=True)
        v.units = 'kg m-2'
        v.long_name = 'total monthly precipitation amount'
        v[:] = prcp

        v = nc.createVariable('temp', 'f4', ('time',), zlib=True)
        v.units = 'degC'
        v.long_name = '2m temperature at height ref_hgt'
        v[:] = temp

        v = nc.createVariable('grad', 'f4', ('time',), zlib=True)
        v.units = 'degC m-1'
        v.long_name = 'temperature gradient'
        v[:] = grad``

Which is almost a complete copy of 'write_monthly_climate_file', and there for I put it just below that function in utils.py.

In order to be able to save the file as gcm_data.nc, I added 2 line in the cfg.py:

_doc = 'The monthly GCM climate timeseries for this glacier, stored in a netCDF ' \ 'file.' BASENAMES['gcm_data'] = ('gcm_data.nc', _doc)

And for convenience I added the following line in the task.py:

from oggm.core.preprocessing.climate import process_gcm_data

So this is the status for now. For it to make sense to include it in the code base I should make it more flexible I think. Next to that there might be some other things that need to be changed. I don't know if it would be better to discuss this further here or leave it for the oggm workshop, both is fine for me.

@fmaussion
Copy link
Member

fmaussion commented Mar 10, 2017

Thanks for your work! Here an answer to your question:

For it to make sense to include it in the code base I should make it more flexible I think.

Not necessarily. Currently this function would be used by you and only you, and there is a rule in programing: don't try to solve problem you don't know about (yet). (this goes together with "if it's not broken, don't fix it") ;-)

So I would welcome a PR where we can discuss your code in more detail. Whether or not we will manage to merge this before the workshop I don't know, but at least we can get started.

@anoukvlug
Copy link
Collaborator Author

That is a funny rule, but it indeed makes sense. I will make a PR on Monday :)

Have a nice weekend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants