-
Notifications
You must be signed in to change notification settings - Fork 44
Description
@ESMValGroup/technical-lead-development-team I need your (rather quick) input here please: we have seen that the new netCDF4=1.6.1 is causing frequent segfaults in our CI tests, and we have pinned it to !=1.6.1 to brush the problem under the carpet for us. However, the good folk at Unidata/netCDF4 are scratching their heads and are wondering what the heck's going on, let's try and help them figure that out, even if we can provide a bit of a narrowed-down picture, it' still helpful. For that, I have opened
- Very frequent segfaults with the new
netCDF4=1.6.1Unidata/netcdf4-python#1192 - New
netCDF4=1.6.1(most probably) causing a fairly large number of Segmentation Faults conda-forge/netcdf4-feedstock#141
(have a read through the discussion there, it's a lot of paint thrown at a white wall)
and I have managed to isolate our side of the problem to the sample data testing. Simplifying the problem, this is how the toy model looks:
import iris
import numpy as np
import pickle
import platform
import pytest
TEST_REVISION = 1
def get_cache_key(value):
"""Get a cache key that is hopefully unique enough for unpickling.
If this doesn't avoid problems with unpickling the cached data,
manually clean the pytest cache with the command `pytest --cache-clear`.
"""
py_version = platform.python_version()
return (f'{value}_iris-{iris.__version__}_'
f'numpy-{np.__version__}_python-{py_version}'
f'rev-{TEST_REVISION}')
@pytest.fixture(scope="module")
def timeseries_cubes_month(request):
"""Load representative timeseries data."""
# cache the cubes to save about 30-60 seconds on repeat use
cache_key = get_cache_key("sample_data/monthly")
data = request.config.cache.get(cache_key, None)
cubes = pickle.loads(data.encode('latin1'))
return cubes
# @pytest.mark.skip
def test_io_1(timeseries_cubes_month):
cubes = timeseries_cubes_month
_ = [c.data for c in cubes] # this produces SegFaults
@pytest.mark.skip
def test_io_2(timeseries_cubes_month):
cubes = timeseries_cubes_month
loaded_cubes = []
for i, c in enumerate(cubes):
iris.save(c, str(i) + ".nc")
lc = iris.load_cube(str(i) + ".nc")
loaded_cubes.append(lc)
_ = [c.data for c in loaded_cubes] # this doesn't produce SegFaultsFrom my tests I found out test_io_1 has a tendency to produce segfaults at that listcomp step (can test with -n 0 or -n 2, doesn't really matter), whereas the other doesn't. Can we gauge anything from that without digging in the actual IO/threading (that is not our plot of land anyway)? Hive mind, folks! 🐝
UPDATE as of 20-Oct-2022 #1727 (comment)