-
Notifications
You must be signed in to change notification settings - Fork 3
Add variant distribution constraint #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #136 +/- ##
=======================================
Coverage 36.44% 36.44%
=======================================
Files 15 15
Lines 1723 1723
=======================================
Hits 628 628
Misses 1095 1095
|
I like this a lot! 🚀 To give a concrete example that might highlight how this could be used: With this test, we could quantize all timestamps to their respective year resulting in a categorical column with values in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great work!
self, | ||
ref: DataReference, | ||
distribution: Dict[T, Tuple[float, float]], | ||
default_bounds: Tuple[float, float] = (0, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you think of a relative violation tolerance parameter? E.g. it could say:
A test succeeds iff
#observations outside of the specified ranges / #observations <= tolerance_parameter
I don't consider it a must - we've simply faren well with tolerances historically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 'A' is expected to have a target share ranging from 5% to 15%, but its actual share is 16%, would you consider the 16% to be a violation of the target range or merely 1% above the upper limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - thanks a bunch! :)
This PR adds a VariantDistributionConstraint which checks if the distribution of values in a column falls within the specified minimum and maximum bounds.