-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQueryColumnCheckOperator doesn't actually implement use_legacy_sql kwarg #28343
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
Not entirely sure what's going on here, but it looks like it should be fine with the |
Thanks @denimalpaca , I spent some more time digging around the source and noticed that in the Do you think maybe this is a composer-specific issue, then? Or something else? Because we're definitely seeing an error suggesting we have I'll try to see if I can reproduce the error locally with the setup & dependencies above to rule out any composer-specific issues. Any ideas for other possible debugging routes? |
The local reproduction would be really useful, if you provide some steps for that I can try it on my end as well. Unfortunately I don't have any other debugging ideas. These operators are a bit obfuscated by the BigQuery API, so I'm not really sure how that @lwyszomi or @turbaszek might have more insight, they were recent (and maybe relevant) contributors according to the git blame! |
Maybe also @bhirsz or @VladaZakharova might help |
Hi Team :) |
cc @vchiapaikeo maybe you can also take a look? |
I’ll let @VladaZakharova take a stab at it first. Going back to work tomorrow so might not have time for the next few days. Please let me know if I can help further though @VladaZakharova ! |
Thanks all! Testing this locally turns out to be a bit of a pain, given the integration with BigQuery and the fact that my org has our projects pretty locked-down, but it could also just be my own lack of experience with airflow. Please let me know if I can help along the way -- in the meantime, we're using the |
Not entirely related but there seems to be a bug on main here: failed_tests(
f"Column: {col}\n\tCheck: {check},\n\tCheck Values: {check_values}\n"
for col, checks in self.column_mapping.items()
for check, check_values in checks.items()
if not check_values["success"]
)
This was introduced here (@denimalpaca): After reverting this line to the previous working call ( And with use_legacy_sql=False, things work as expected: DAG I used to test: from airflow import DAG
from airflow.providers.google.cloud.operators.bigquery import BigQueryColumnCheckOperator
DEFAULT_TASK_ARGS = {
"owner": "gcp-data-platform",
"retries": 1,
"retry_delay": 10,
"start_date": "2022-08-01",
}
with DAG(
max_active_runs=1,
concurrency=2,
catchup=False,
schedule_interval="@daily",
dag_id="test_bigquery_column_check",
default_args=DEFAULT_TASK_ARGS,
) as dag:
basic_column_quality_checks = BigQueryColumnCheckOperator(
task_id="check_columns",
table="my-project.vchiapaikeo.test1",
use_legacy_sql=False,
column_mapping={
"col1": {"min": {"greater_than": 0}},
},
) |
Hi Team :)
@tmccall8829 Can we also check how did you add some data to your dataset? Did you use already existing dataset or you have used some operator to upload it? Is this data is transferred from another storage like GCS? |
Created this PR to fix the runtime / type error: #28796 |
@vchiapaikeo Thank you for the quick PR for this fix! |
Closed by #28796 |
Apache Airflow Provider(s)
google
Versions of Apache Airflow Providers
apache-airflow-providers-google==8.6.0
Apache Airflow version
v2.3.4
Operating System
Composer (so I think debian, probably?)
Deployment
Composer
Deployment details
We use a standard/vanilla composer deployment setup -- nothing fancy.
What happened
Using the following BigQueryColumnCheckOperator block:
results in the following error in airflow:
This occurs even though
use_legacy_sql
is set toFalse
.What you think should happen instead
Not sure if the problem here is that the error message is wrong/outdated, or if the
use_legacy_sql
flag isn't being actually passed through toBigQueryHook()
when it gets called inBigQueryColumnCheckOperator._submit_job()
. Either way, we should not be using legacy SQL in this case, since we explicitly specifyuse_legacy_sql=False
.How to reproduce
Run a simple DAG with just a single task (the operator I pasted above) on airflow v2.3.4 with
apache-airflow-providers-google==8.6.0
. The BQ table in question should have a columnupdated_at
that is a DATETIME type.Anything else
Digging into source, I think the problem is that the returned hook in
BigQueryColumnCheckOperator._submit_job()
does not get passedself.use_legacy_sql
, so the hook obj just uses the default (which is set toTrue
). If it's really that simple, happy to submit a PR to fix!FWIW, it actually seems like this occurs in other BigQuery operators defined here as well... it could definitely be that I'm misreading the source, though, so apologies if that's the case!
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: