Skip to content

DOC: Streamline debugging of user hydrodynamic dataset metadata #2224

@VeckoTheGecko

Description

@VeckoTheGecko

xref #2003

Users coming to Parcels come with various forcing datasets with varying quality. Turns out that it is easy to dump and regenerate xarray datasets if you're not interested in the data itself - allowing to share this output in issue descriptions so that we can recreate the dataset on our end.

e.g.,

import dask
from parcels._datasets.structured.generic import datasets
import xarray as xr
import numpy as np

def load_dataset():
    return datasets["2d_left_rotated"][["data_g"]]

# util in Parcels test suite
def fill_with_dummy_data(d: dict[str, dict]):
    assert isinstance(d, dict)
    if "dtype" in d:
        d["data"] = np.zeros(d["shape"], dtype=d["dtype"])
        del d["dtype"]
        del d["shape"]

    for k in d:
        if isinstance(d[k], dict):
            d[k] = fill_with_dummy_data(d[k])

    return d


ds = load_dataset()
d = ds.to_dict(data=False)
# users paste the output dict above into issue alongside description


# during debugging maintainer can do...
ds = xr.Dataset.from_dict(fill_with_dummy_data(d))

Shame I only found this now - would have made creating dummy datasets much easier.

TODO:

  • Add tooling
  • Update issue template/contributing guidelines (As part of this, we should also update our issue template so that "having trouble with fieldset loading" is a dedicated issue (and we can give instructions on how they can fix their metadata))
    • Updating of issue template should be done after v4 is released and main becomes the default branch again

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Backlog

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions