Open
Description
map_over_datasets
-- a way to compute over datatrees -- currently seems to try an operate even on nodes which contain no datasets, and consequently raises an error.
This seems to be a new issue, and was not a problem when this function was called map_over_subtree
, which was part of the experimental datatree versions.
An example to reproduce this problem is below:
## Generate datatree, using example from documentation
def time_stamps(n_samples, T):
"""Create an array of evenly-spaced time stamps"""
return xr.DataArray(
data=np.linspace(0, 2 * np.pi * T, n_samples), dims=["time"]
)
def signal_generator(t, f, A, phase):
"""Generate an example electrical-like waveform"""
return A * np.sin(f * t.data + phase)
time_stamps1 = time_stamps(n_samples=15, T=1.5)
time_stamps2 = time_stamps(n_samples=10, T=1.0)
voltages = xr.DataTree.from_dict(
{
"/oscilloscope1": xr.Dataset(
{
"potential": (
"time",
signal_generator(time_stamps1, f=2, A=1.2, phase=0.5),
),
"current": (
"time",
signal_generator(time_stamps1, f=2, A=1.2, phase=1),
),
},
coords={"time": time_stamps1},
),
"/oscilloscope2": xr.Dataset(
{
"potential": (
"time",
signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.2),
),
"current": (
"time",
signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
),
},
coords={"time": time_stamps2},
),
}
)
## Write some function to add resistance
def add_resistance_only_do(dtree):
def calculate_resistance(ds):
ds_new = ds.copy()
ds_new['resistance'] = ds_new['potential']/ds_new['current']
return ds_new
dtree = dtree.map_over_datasets(calculate_resistance)
return dtree
def add_resistance_try(dtree):
def calculate_resistance(ds):
ds_new = ds.copy()
try:
ds_new['resistance'] = ds_new['potential']/ds_new['current']
return ds_new
except:
return ds_new
dtree = dtree.map_over_datasets(calculate_resistance)
return dtree
Calling voltages = add_resistance_only_do(voltages)
raises the error:
KeyError: "No variable named 'potential'. Variables on the dataset include []"
Raised whilst mapping function over node with path '.'
This can be easily resolved by putting try statements in (e.g. voltages = add_resistance_try(voltages)
), but we know that Yoda would not recommend try (right @TomNicholas).
Can this be built in as a default feature of map_over_datasets
? as many examples of datatree will have nodes without datasets.