bool(ds) should raise a "the truth value of a Dataset is ambiguous" error

Throwing this out there - happy to be shot down if people are opposed. 

### Current behavior / griping

Currently, coercing a dataset to a boolean invokes [`ds.__bool__`](https://github.com/pydata/xarray/blob/main/xarray/core/dataset.py#L1451-L1452), which in turn calls `bool(ds.data_vars)`:

```python
class Dataset(DataWithCoords, DatasetArithmetic, Mapping):
    ...
    def __bool__(self) -> bool:
        return bool(self.data_vars)
```

This has the unfortunate property of returning True as long as there is at least one data_variable, regardless of the contents.

Currently, the behavior of `Dataset.__bool__` is, at least as far as I've seen, never helpful but frequently unhelpful. I've seen (and written) tests written for DataArrays being passed a Dataset and suddenly the tests are meaningless _so many times_. Conversely, I've never found a legitimate use case for `bool(ds)`. As far as I can tell, this is essentially the same as `len(ds.data_vars) > 0`.

In fact, while testing out my proposed changes below on a fork, I found two tests in the xarray test suite that had succumbed to this issue: see https://github.com/pydata/xarray/pull/6122 and https://github.com/pydata/xarray/pull/6123.

This *has* been discussed before - see https://github.com/pydata/xarray/issues/4290. This discussion focused on the question "should ` bool(xr.Dataset({'a': False}))` return False?". I agree that it's not clear when it should be false and picking a behavior which deviates from Mapping feels arbitrary and gross.

### Proposed behavior

I'm proposing that the API be changed, so that ` bool(xr.Dataset({'a': False}))` raise an error, similar to the implementation in [`pd.Series`](https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/core/generic.py#L1535-L1542).

In this implementation in pandas, attempting to evaluate even a single-element series as a boolean raises an error:

```python
In [14]: bool(pd.Series([False]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-b0ad7f4d9277> in <module>
----> 1 bool(pd.Series([False]))

~/miniconda3/envs/rhodium-env/lib/python3.9/site-packages/pandas/core/generic.py in __nonzero__(self)
   1532     @final
   1533     def __nonzero__(self):
-> 1534         raise ValueError(
   1535             f"The truth value of a {type(self).__name__} is ambiguous. "
   1536             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
```

I understand hesitancy around changing the core API. That said, if anyone can find an important, correct use of `bool(ds)` in the wild I'll eat my hat :)

### Implementation

This could be as simple as raising an error on `ds.__bool__`, something like:

```python
class Dataset(DataWithCoords, DatasetArithmetic, Mapping):
    ...
    def __bool__(self) -> bool:
        raise ValueError(
            "The truth value of a Dataset is ambiguous. Reduce the data "
            "variables to a scalar value with any(ds.values()) or "
            "all(ds.values())."
        )
```

The only other change that would be needed is an assertion that directly calls `bool(ds)` in [test_dataset::TestDataset.test_properties](https://github.com/pydata/xarray/blob/main/xarray/tests/test_dataset.py#L547), which checks for the exact behavior I'm changing:

```python
        assert bool(ds)
```

This would need to be changed to:

```python
        with pytest.raises(ValueError):
            bool(ds)
```

If this sounds good, I can submit a PR with these changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bool(ds) should raise a "the truth value of a Dataset is ambiguous" error #6124

Current behavior / griping

Proposed behavior

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

bool(ds) should raise a "the truth value of a Dataset is ambiguous" error #6124

Description

Current behavior / griping

Proposed behavior

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions