Segmentation fault when writing to netcdf with dask-enabled xarray dataset #1172

hdail · 2016-12-19T19:12:47Z

I have a 4 GB netcdf file and am running on a machine with 32 GB of memory. The following works just fine without error on this large memory machine:

ds = xarray.open_dataset('input.nc')
ds.to_netcdf('output.nc')

This dask + pynio approach also works correctly:

ds = xarray.open_dataset('input.nc', chunks={'a': 25, 'b': 25}, engine='pynio')
ds.to_netcdf('output.nc')

But the following dask + default engine (netcdf4 probably?) approach slowly sucks up all the system memory, writes out a file ~~twice as large as it should be~~ with variable values that are extremely large, and then fails with seg fault, bus error or other low-level system errors we'd rather not be seeing in python!

ds = xarray.open_dataset('input.nc', chunks={'a': 25, 'b': 25})
ds.to_netcdf('output.nc')

Adding lock=True to the open_dataset call does not help.

I have two workable solutions to my problem (run without dask because I have a lot of memory available, or use engine='pynio') but this error was hard to track down so thought you would want to know. Would be glad to hear also if I missed something in the docs and the all-too-common user error is to blame =)

The text was updated successfully, but these errors were encountered:

hdail · 2016-12-19T19:28:28Z

An important addition -- the following also causes low-level system errors (bus error or seg fault, can't remember) so the problem does not originate in the to_netcdf per se, but rather in the chunking / loading of the dataset.

ds = xarray.open_dataset('input.nc', chunks={'a': 25, 'b': 25})
ds.load()

shoyer · 2016-12-19T22:22:08Z

@hdail thanks for the report! Which verison of xarray are you using?

We fixed something that sounds pretty similar (#936) in v0.8.2.

hdail · 2016-12-19T22:43:16Z

I'm using 0.8.2. Thanks for the issue link; I had read through that, but since I am not using open_mfdataset and lock=True did not fix my issue, I figured my problem was subtly different. Some incompatibility / race condition when using netcdf4 and dask together? This might be a tricky problem to track down as my code did complete without seg faulting when my dimensions were subtly different (about 10% small in space and 20% smaller in time), even on a server with half as much memory. Bleck.

shoyer · 2016-12-19T22:57:00Z

Yes, this is different.

I think this is bug in how we write netCDF files. Currently, we always use a new thread lock in ArrayWriter.sync(). To avoid possible concurrency issues with the HDF5 API, we really should be reusing the same _default_lock that we use for reading netCDF files.

hdail · 2016-12-19T23:01:52Z

Thanks for the info! Given this potential bug, is the driver='pynio' solution acceptable, or is it just working for me for now and may fail in some subtly different configuration / data size?

Another possible solution suggested by a colleague - add the following at the top to enforce single-threaded reads and writes.

dask.set_options(get=dask.async.get_sync)

shoyer · 2016-12-19T23:12:50Z

It is possible that pynio is linking to an independent HDF5 installation, which should eliminate the need for a shared lock. But if that's not the case, then you probably just got lucky.

…writing Fixes pydata#1172 The serializable lock will be useful for dask.distributed or multi-processing (xref pydata#798, pydata#1173, among others).

…ing (#1179) * Switch to shared Lock (SerializableLock if possible) for reading and writing Fixes #1172 The serializable lock will be useful for dask.distributed or multi-processing (xref #798, #1173, among others). * Test serializable lock * Use conda-forge for builds * remove broken/fragile .test_lock

shoyer added the bug label Dec 20, 2016

shoyer mentioned this issue Dec 22, 2016

Switch to shared Lock (SerializableLock if possible) for reading/writing #1179

Merged

shoyer closed this as completed in #1179 Jan 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when writing to netcdf with dask-enabled xarray dataset #1172

Segmentation fault when writing to netcdf with dask-enabled xarray dataset #1172

hdail commented Dec 19, 2016 •

edited

Loading

hdail commented Dec 19, 2016

shoyer commented Dec 19, 2016

hdail commented Dec 19, 2016

shoyer commented Dec 19, 2016

hdail commented Dec 19, 2016 •

edited

Loading

shoyer commented Dec 19, 2016

Segmentation fault when writing to netcdf with dask-enabled xarray dataset #1172

Segmentation fault when writing to netcdf with dask-enabled xarray dataset #1172

Comments

hdail commented Dec 19, 2016 • edited Loading

hdail commented Dec 19, 2016

shoyer commented Dec 19, 2016

hdail commented Dec 19, 2016

shoyer commented Dec 19, 2016

hdail commented Dec 19, 2016 • edited Loading

shoyer commented Dec 19, 2016

hdail commented Dec 19, 2016 •

edited

Loading

hdail commented Dec 19, 2016 •

edited

Loading