Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix Index is tilted using combine_by_coords #5760

Open
anonymousForPeer opened this issue Sep 2, 2021 · 2 comments
Open

Matrix Index is tilted using combine_by_coords #5760

anonymousForPeer opened this issue Sep 2, 2021 · 2 comments
Labels
topic-combine combine/concat/merge

Comments

@anonymousForPeer
Copy link

My calculations return a strange tilted index. Why does this happen?

What happened:
I combined several user defined chunked netcdf data (900 chunks) into one dataset. For this I used the default combine_by_coords in ds.open_mf_dataset(). My result was a tilted grid index - upper left corner i=0, j=1167.

Beforehand I calculated some indices on these chunks and combined them with the default combine='by_coords' in ds.open_mf_dataset() but also tested the combine='nested' separately.

The ones where I used default combine='by_coords' for all functions returned the tilted index.
The ones where I used combine='nested' beforehand and then default combine='by_coords' returned the correct index.

What you expected to happen:
No tilted index.

Minimal Complete Verifiable Example:

##returning wrong index
#calculating some climatic indices on numbered chunks and combined them to one ds per chunk
with xr.open_mfdataset(pathtofile+'annual*'+chunknumber+'.nc', chunks=-1, parallel=True, engine='h5netcdf') as ds:
    ds.to_netcdf(pathtofile, format="NETCDF4_CLASSIC", engine="netcdf4")

##combining all
with xr.open_mfdataset(pathtofile+'climateAnnual*.nc', chunks=-1, parallel=True, engine='h5netcd') as ds: 
    ds.to_netcdf(pathtofile, format="NETCDF4_CLASSIC", engine="netcdf4")

##################################################

##returning correct index
#calculating some climatic indices on numbered chunks and combined them to one ds per chunk
with xr.open_mfdataset(pathtofile+'annual*'+chunknumber+'.nc', chunks=-1, parallel=True, engine='h5netcdf', combine='nested') as ds:
    ds.to_netcdf(pathtofile, format="NETCDF4_CLASSIC", engine="netcdf4")

##combining all
with xr.open_mfdataset(pathtofile+'climateAnnual*.nc', chunks=-1, parallel=True, engine='h5netcd') as ds: 
    ds.to_netcdf(pathtofile, format="NETCDF4_CLASSIC", engine="netcdf4")

Anything else we need to know?:

Environment:
Python 3.7.4

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.4 (default, Jun 3 2020, 14:52:58)
[GCC 8.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.15.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.18.2
pandas: 0.25.3
numpy: 1.17.3
scipy: 1.3.1
netCDF4: 1.5.7
pydap: installed
h5netcdf: 0.11.0
h5py: 3.3.0
Nio: None
zarr: 2.8.3
cftime: 1.5.0
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: 1.2.6
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.06.2
distributed: 2021.06.2
matplotlib: 3.4.2
cartopy: None
seaborn: 0.11.1
numbagg: 0.2.1
pint: 0.17
setuptools: 57.0.0
pip: 21.1.3
conda: None
pytest: None
IPython: None
sphinx: None

@TomNicholas
Copy link
Member

Hi @areichmuth - I would love to help but I would need some more information from you first.

What is a "tilted grid index"? Do you mean that the files have not been combined in the order you expected them to be?

It's very hard to debug problems unless I can reproduce them locally. Do you have some example data files you could upload that this problem occurs with? Or even better some small code snippet that generates an example which shows the same issue?

@dcherian dcherian added the topic-combine combine/concat/merge label Sep 2, 2021
@anonymousForPeer
Copy link
Author

Thank you @TomNicholas - strangely I can't reproduce it anymore on my local machine - it all happened on our slurm. The result is correct according to the input file index.
In my case I calculated annual and seasonal climate variables on the same input files, but the matrix index i,j were different. One with upper left corner (0,0) and the other one with (0,1167) - as shown in ncview.
Nevertheless here is what I did - you can test it with https://www.unidata.ucar.edu/software/netcdf/examples/sresa1b_ncar_ccsm3-example.nc:

import numpy as np
import xarray as xr

chunks=4

lonrange=256
latrange=128

##creating the chunks - our slurm can't handle dask_jobqueue and dask chunking wasnt possible as well
x=[x.tolist() for x in np.array_split(range(lonrange), chunks)]
xextend = [[sublist[0],sublist[-1]] for sublist in x]

y=[y.tolist() for y in np.array_split(range(latrange), chunks)]
yextend = [[sublist[0],sublist[-1]] for sublist in y]

#concatenating the chunks
allChunks = [[x,y] for x in xextend for y in yextend]

for k in range(0,chunks*chunks):

	inter = str(k)

	tas = xr.open_dataset('~/pathToFile/sresa1b_ncar_ccsm3-example.nc').isel(longitude=slice(min(allChunks[k][0]), max(allChunks[k][0])), latitude=slice(min(allChunks[k][1]), max(allChunks[k][1])))
        ##instead of my climate calculations
	tas.rename({'tas':'test1'}).to_netcdf('~/pathToFile/climateCalculation1_'+inter+'.nc')
	tas.rename({'tas':'test2'}).to_netcdf('~/pathToFile/climateCalculation2_'+inter+'.nc')

	#combining the single data arrays per chunk
        ##combine using nested
	with xr.open_mfdataset('~/pathToFile/climateCalculation*'+inter+'.nc', chunks=-1, parallel=True, engine='h5netcdf', combine='nested') as ds:
		ds.to_netcdf('~/pathToFile/nestedClimateAnnualCalculations_'+inter+'.nc')
        #combine using default coords
        with xr.open_mfdataset('~/pathToFile/climateCalculation*'+inter+'.nc', chunks=-1, parallel=True, engine='h5netcdf') as ds:
            ds.to_netcdf('~/pathToFile/climateAnnualCalculations_'+inter+'.nc')

##combining all chunks to one final file
##nested input
with xr.open_mfdataset('~/pathToFile/climateCalculations/nestedClimateAnnualCalculations_*', chunks=-1, parallel=True, engine='h5netcdf') as ds:    
	ds.to_netcdf('~/pathToFile/climateAnnualCalculationsCombinedNested.nc')

with xr.open_mfdataset('~/pathToFile/climateCalculations/climateAnnualCalculations_*', chunks=-1, parallel=True, engine='h5netcdf') as ds:    
	ds.to_netcdf('~/pathToFile/climateAnnualCalculationsCombined.nc')
        

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-combine combine/concat/merge
Projects
None yet
Development

No branches or pull requests

3 participants