Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise error when multiple constraints can't be enforced #541

Closed
amontanez24 opened this issue Aug 3, 2021 · 1 comment · Fixed by #548
Closed

Raise error when multiple constraints can't be enforced #541

amontanez24 opened this issue Aug 3, 2021 · 1 comment · Fixed by #548
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

Problem Description

Certain constraints change or drop the input columns when using the transform handling_strategy. This means that if multiple constraints are applied to the same column, and one of them does this, then the constraints will not work properly since one of them will attempt to make changes to a column that was either altered or dropped.

This problem does not occur if constraints use the reject_sampling handling_strategy.

Expected behavior

The goal of this issue is to sort constraints so that any using the reject_sampling strategy occur first. On top of that, if multiple constraints are using transform and they touch the same column, we should raise an error saying that those constraints might not be enforceable.

@amontanez24 amontanez24 added feature request Request for a new feature pending review labels Aug 3, 2021
@amontanez24 amontanez24 changed the title Raise error when nested constraints can't be enforced Raise error when multiple constraints can't be enforced Aug 5, 2021
@csala
Copy link
Contributor

csala commented Aug 5, 2021

It seems like we could expand the validation to allow more than one constraint using the transform strategy affecting the same columns, as far as a few rules are met.

Here is some context to be considered before jumping into conclusion:

  1. Problems occur whenever:
    a. A constraint modifies a column during transform that will be used later on by another constraint down the list
    b. A constraint rebuilds a column during reverse_transform that was previously used or rebuilt by another constraint down the list.
  2. A few generic considerations can be taken into account:
    a. A constraint transform method cannot modify something that will not be rebuilt during reverse_transform.
    b. A constraint transform cannot modify something a column it does not use at all.
    c. A constraint reverse_transform cannot rebuild or use something that transform did not use at all.

Conclusion

The implementation of this rule could be based on:

  1. Adding 1 new attribute to each constraint called rebuild_columns, which indicates which columns will be modified during the reverse_transform call, according to the given __init__ arguments.
  2. Making sure that all the constraints populate their constraint_columns attribute during __init__, and that the list contains all the columns that will be used or modified by the constraint at any point.
  3. When validating or sorting constraints, make sure that the columns seen in the rebuild_columns attribute of each constraint do not show up in the constraint_columns of any of the constraints that are processed later on down the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants