Is there an existing issue for this?
Problem statement
DQX currently supports dataset-level checks, for example aggregate checks over the whole DataFrame. However, when a dataset-level check fails, the resulting _errors or _warnings are attached to every row in the input DataFrame.
This is technically understandable because the whole dataset violates the rule, but it creates a lot of noise when using quarantine workflows. For example, if I define a dataset-level check such as “the percentage of rows with target_label = 1 must be above a threshold,” and the threshold is not met, every row is emitted with the same warning/error. That makes the quarantine table look like every individual row is bad, even though the failure is really a table-level metric failure.
It would be useful to have first-class dataset-level result handling, where dataset-level checks can produce a single check result per run/check instead of attaching the result to every row.
Proposed Solution
dataset level checks output
Additional Context
No response
Is there an existing issue for this?
Problem statement
DQX currently supports dataset-level checks, for example aggregate checks over the whole DataFrame. However, when a dataset-level check fails, the resulting
_errorsor_warningsare attached to every row in the input DataFrame.This is technically understandable because the whole dataset violates the rule, but it creates a lot of noise when using quarantine workflows. For example, if I define a dataset-level check such as “the percentage of rows with target_label = 1 must be above a threshold,” and the threshold is not met, every row is emitted with the same warning/error. That makes the quarantine table look like every individual row is bad, even though the failure is really a table-level metric failure.
It would be useful to have first-class dataset-level result handling, where dataset-level checks can produce a single check result per run/check instead of attaching the result to every row.
Proposed Solution
dataset level checks output
Additional Context
No response