-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Adding Amazon Glue Data Quality Service #39923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vincbeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments but it is solid overall!
|
@vincbeck Thank you for all the suggestions, made all the changes please review. Is the docstring sufficient for the log_results?. please let me know happy to refine. 😄 . |
Yep, it looks good, it definitely helps to understand the function! |
Pandas is used if the user optionally selects advanced output processing when providing `show_results=True` (default is False) to GlueDataQualityRuleSetEvaluationRunOperator and GlueDataQualityRuleSetEvaluationRunSensor However, the original PR (apache#39923) adding these operators and sensors did not include Pandas as a dependency of the Amazon Provider Package. I assume this is because Pandas is quite a heavy dependency that we don't want all users to have to install just for this very small usecase. So this commit catches the exception and logs to the user rather than failing catastrophically as it does now.
Pandas is used if the user optionally selects advanced output processing when providing `show_results=True` (default is False) to GlueDataQualityRuleSetEvaluationRunOperator and GlueDataQualityRuleSetEvaluationRunSensor However, the original PR (#39923) adding these operators and sensors did not include Pandas as a dependency of the Amazon Provider Package. I assume this is because Pandas is quite a heavy dependency that we don't want all users to have to install just for this very small usecase. So this commit catches the exception and logs to the user rather than failing catastrophically as it does now.
Pandas is used if the user optionally selects advanced output processing when providing `show_results=True` (default is False) to GlueDataQualityRuleSetEvaluationRunOperator and GlueDataQualityRuleSetEvaluationRunSensor However, the original PR (apache#39923) adding these operators and sensors did not include Pandas as a dependency of the Amazon Provider Package. I assume this is because Pandas is quite a heavy dependency that we don't want all users to have to install just for this very small usecase. So this commit catches the exception and logs to the user rather than failing catastrophically as it does now.
Pandas is used if the user optionally selects advanced output processing when providing `show_results=True` (default is False) to GlueDataQualityRuleSetEvaluationRunOperator and GlueDataQualityRuleSetEvaluationRunSensor However, the original PR (apache#39923) adding these operators and sensors did not include Pandas as a dependency of the Amazon Provider Package. I assume this is because Pandas is quite a heavy dependency that we don't want all users to have to install just for this very small usecase. So this commit catches the exception and logs to the user rather than failing catastrophically as it does now.
Pandas is used if the user optionally selects advanced output processing when providing `show_results=True` (default is False) to GlueDataQualityRuleSetEvaluationRunOperator and GlueDataQualityRuleSetEvaluationRunSensor However, the original PR (apache#39923) adding these operators and sensors did not include Pandas as a dependency of the Amazon Provider Package. I assume this is because Pandas is quite a heavy dependency that we don't want all users to have to install just for this very small usecase. So this commit catches the exception and logs to the user rather than failing catastrophically as it does now.
Adding Amazon Glue Data Quality Service. Doc, Hook, Operator, Sensor, Trigger, Waiter, Unit Test, System Test.
GlueDataQualityOperator: Create ruleset or update ruleset
GlueDataQualityRuleSetEvaluationRunOperator: Execute evaluations on multiple rulesets.
Sample Dag for creating ruleset and execute evaluation:
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.