Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for zeros in the data #159

Open
peterdudfield opened this issue Jun 27, 2024 · 1 comment
Open

Check for zeros in the data #159

peterdudfield opened this issue Jun 27, 2024 · 1 comment
Labels
good first issue Good for newcomers

Comments

@peterdudfield
Copy link
Contributor

Detailed Description

It would be great to have a check in place that checks for zeros. A large amount of these is normally an error

Context

  • good to catch data problems early and fail hard

Possible Implementation

  • add a a check, if number os zeros >20% of the data, error
  • @devsjc can you suggest a place in the code for this check?
@peterdudfield peterdudfield added the good first issue Good for newcomers label Jun 27, 2024
@devsjc
Copy link
Collaborator

devsjc commented Jul 1, 2024

Yes, I would add it to the filter here:

def _dataQualityFilter(ds: xr.Dataset) -> bool:
"""Filter out data that is not of sufficient quality."""
if ds == xr.Dataset():
return False
# Carry out a basic data quality check
for data_var in ds.data_vars:
if ds[f"{data_var}"].isnull().any():
log.warn(
event=f"Dataset has NaNs in variable {data_var}",
initTime=str(ds.coords["init_time"].values[0])[:16],
variable=data_var,
)
return True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants