You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Certain constraints change or drop the input columns when using the transformhandling_strategy. This means that if multiple constraints are applied to the same column, and one of them does this, then the constraints will not work properly since one of them will attempt to make changes to a column that was either altered or dropped.
This problem does not occur if constraints use the reject_samplinghandling_strategy.
Expected behavior
The goal of this issue is to sort constraints so that any using the reject_sampling strategy occur first. On top of that, if multiple constraints are using transform and they touch the same column, we should raise an error saying that those constraints might not be enforceable.
The text was updated successfully, but these errors were encountered:
amontanez24
changed the title
Raise error when nested constraints can't be enforced
Raise error when multiple constraints can't be enforced
Aug 5, 2021
It seems like we could expand the validation to allow more than one constraint using the transform strategy affecting the same columns, as far as a few rules are met.
Here is some context to be considered before jumping into conclusion:
Problems occur whenever:
a. A constraint modifies a column during transform that will be used later on by another constraint down the list
b. A constraint rebuilds a column during reverse_transform that was previously used or rebuilt by another constraint down the list.
A few generic considerations can be taken into account:
a. A constraint transform method cannot modify something that will not be rebuilt during reverse_transform.
b. A constraint transform cannot modify something a column it does not use at all.
c. A constraint reverse_transform cannot rebuild or use something that transform did not use at all.
Conclusion
The implementation of this rule could be based on:
Adding 1 new attribute to each constraint called rebuild_columns, which indicates which columns will be modified during the reverse_transform call, according to the given __init__ arguments.
Making sure that all the constraints populate their constraint_columns attribute during __init__, and that the list contains all the columns that will be used or modified by the constraint at any point.
When validating or sorting constraints, make sure that the columns seen in the rebuild_columns attribute of each constraint do not show up in the constraint_columns of any of the constraints that are processed later on down the list.
Problem Description
Certain constraints change or drop the input columns when using the
transform
handling_strategy
. This means that if multiple constraints are applied to the same column, and one of them does this, then the constraints will not work properly since one of them will attempt to make changes to a column that was either altered or dropped.This problem does not occur if constraints use the
reject_sampling
handling_strategy
.Expected behavior
The goal of this issue is to sort constraints so that any using the
reject_sampling
strategy occur first. On top of that, if multiple constraints are usingtransform
and they touch the same column, we should raise an error saying that those constraints might not be enforceable.The text was updated successfully, but these errors were encountered: