Skip to content

SegFaults in the sample data tests with netCDF4=1.6.1 #1727

@valeriupredoi

Description

@valeriupredoi

@ESMValGroup/technical-lead-development-team I need your (rather quick) input here please: we have seen that the new netCDF4=1.6.1 is causing frequent segfaults in our CI tests, and we have pinned it to !=1.6.1 to brush the problem under the carpet for us. However, the good folk at Unidata/netCDF4 are scratching their heads and are wondering what the heck's going on, let's try and help them figure that out, even if we can provide a bit of a narrowed-down picture, it' still helpful. For that, I have opened

(have a read through the discussion there, it's a lot of paint thrown at a white wall)

and I have managed to isolate our side of the problem to the sample data testing. Simplifying the problem, this is how the toy model looks:

import iris
import numpy as np
import pickle
import platform
import pytest

TEST_REVISION = 1

def get_cache_key(value):
    """Get a cache key that is hopefully unique enough for unpickling.

    If this doesn't avoid problems with unpickling the cached data,
    manually clean the pytest cache with the command `pytest --cache-clear`.
    """
    py_version = platform.python_version()
    return (f'{value}_iris-{iris.__version__}_'
            f'numpy-{np.__version__}_python-{py_version}'
            f'rev-{TEST_REVISION}')


@pytest.fixture(scope="module")
def timeseries_cubes_month(request):
    """Load representative timeseries data."""
    # cache the cubes to save about 30-60 seconds on repeat use
    cache_key = get_cache_key("sample_data/monthly")
    data = request.config.cache.get(cache_key, None)
    cubes = pickle.loads(data.encode('latin1'))

    return cubes


# @pytest.mark.skip
def test_io_1(timeseries_cubes_month):
    cubes = timeseries_cubes_month
    _ = [c.data for c in cubes]  # this produces SegFaults


@pytest.mark.skip
def test_io_2(timeseries_cubes_month):
    cubes = timeseries_cubes_month
    loaded_cubes = []
    for i, c in enumerate(cubes):
        iris.save(c, str(i) + ".nc")
        lc = iris.load_cube(str(i) + ".nc")
        loaded_cubes.append(lc)
    _ = [c.data for c in loaded_cubes]  # this doesn't produce SegFaults

From my tests I found out test_io_1 has a tendency to produce segfaults at that listcomp step (can test with -n 0 or -n 2, doesn't really matter), whereas the other doesn't. Can we gauge anything from that without digging in the actual IO/threading (that is not our plot of land anyway)? Hive mind, folks! 🐝

UPDATE as of 20-Oct-2022 #1727 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions