Skip to content

map_over_datasets throws error on nodes without datasets #9693

Open
@dhruvbalwada

Description

@dhruvbalwada

map_over_datasets -- a way to compute over datatrees -- currently seems to try an operate even on nodes which contain no datasets, and consequently raises an error.
This seems to be a new issue, and was not a problem when this function was called map_over_subtree, which was part of the experimental datatree versions.

An example to reproduce this problem is below:

## Generate datatree, using example from documentation
def time_stamps(n_samples, T):
    """Create an array of evenly-spaced time stamps"""
    return xr.DataArray(
        data=np.linspace(0, 2 * np.pi * T, n_samples), dims=["time"]
    )


def signal_generator(t, f, A, phase):
    """Generate an example electrical-like waveform"""
    return A * np.sin(f * t.data + phase)


time_stamps1 = time_stamps(n_samples=15, T=1.5)

time_stamps2 = time_stamps(n_samples=10, T=1.0)

voltages = xr.DataTree.from_dict(
    {
        "/oscilloscope1": xr.Dataset(
            {
                "potential": (
                    "time",
                    signal_generator(time_stamps1, f=2, A=1.2, phase=0.5),
                ),
                "current": (
                    "time",
                    signal_generator(time_stamps1, f=2, A=1.2, phase=1),
                ),
            },
            coords={"time": time_stamps1},
        ),
        "/oscilloscope2": xr.Dataset(
            {
                "potential": (
                    "time",
                    signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.2),
                ),
                "current": (
                    "time",
                    signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
                ),
            },
            coords={"time": time_stamps2},
        ),
    }
)

## Write some function to add resistance
def add_resistance_only_do(dtree): 
    def calculate_resistance(ds):
        ds_new = ds.copy()
        
        ds_new['resistance'] = ds_new['potential']/ds_new['current']
        return ds_new 
        
    dtree = dtree.map_over_datasets(calculate_resistance)
    
    return dtree
    
def add_resistance_try(dtree): 
    def calculate_resistance(ds):
        ds_new = ds.copy()
        try:
            ds_new['resistance'] = ds_new['potential']/ds_new['current']
            return ds_new 
        except:
            return ds_new

    dtree = dtree.map_over_datasets(calculate_resistance)
    
    return dtree

Calling voltages = add_resistance_only_do(voltages) raises the error:

KeyError: "No variable named 'potential'. Variables on the dataset include []"
Raised whilst mapping function over node with path '.'

This can be easily resolved by putting try statements in (e.g. voltages = add_resistance_try(voltages)), but we know that Yoda would not recommend try (right @TomNicholas).

Can this be built in as a default feature of map_over_datasets? as many examples of datatree will have nodes without datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic-DataTreeRelated to the implementation of a DataTree class

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions