Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigqueryToGCS Operator Failing #24364

Closed
1 of 2 tasks
adityaprakash-bobby opened this issue Jun 10, 2022 · 10 comments
Closed
1 of 2 tasks

BigqueryToGCS Operator Failing #24364

adityaprakash-bobby opened this issue Jun 10, 2022 · 10 comments
Labels
area:providers duplicate Issue that is duplicated kind:bug This is a clearly a bug

Comments

@adityaprakash-bobby
Copy link

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

apache-airflow-providers-google==2022.5.18+composer

Apache Airflow version

2.2.3

Operating System

Managed

Deployment

Composer

Deployment details

Environment Configuration:

  • Composer version: composer-2.0.14
  • Airflow version: airflow-2.2.3
  • Image version: composer-2.0.14-airflow-2.2.3

Workload configuration:

  • Scheduler - 2 vCPUs, 4 GB memory, 5 GB storage
  • Number of schedulers - 1
  • Web server - 1 vCPU, 4 GB memory, 5 GB storage
  • Worker - 2 vCPUs, 7.5 GB memory, 10 GB storage
  • Number of workers - Autoscaling between 1 and 3 workers

Core Infrastructure:

  • Environment Size: Medium

Configuration Overrides:

No Airflow configuration overrides

What happened

Previously we were on apache-airflow-providers-google==6.4.0 (composer - 2.0.8 | airflow - 2.2.3) in which we were using the BigqueryToGCS operator in our DAGs as follows:

from airflow.providers.google.cloud.transfers import bigquery_to_gcs

###
###
###
###

bq_to_gcs_task = bigquery_to_gcs.BigQueryToGCSOperator(
  task_id='BQ_TO_GCS',
  source_project_dataset_table=bq_table,
  destination_cloud_storage_uris=f"gs://{bucket_name}//{folder_name}//file*.csv",
  export_format='CSV'
)

task_1 >> bq_to_gcs_task  >> .. >> ..

This was working until we switched to apache-airflow-providers-google==2022.5.18+composer (composer - 2.0.14 | airflow - 2.2.3).
Now every time the operator is executed, the task goes to fail state in airflow. However, I observed that the CSV files are created as expected from the operator. What the logs state is that the operator is not able to find the bigquery job it executed, hence fails. Task logs are follows:

[2022-06-05, 06:23:02 UTC] {bigquery_to_gcs.py:120} INFO - Executing extract of <project_id>.<dataset_name>.<table_name> into: gs://<bucket_name>//<folder_name>//file*.csv
[2022-06-05, 06:23:03 UTC] {warnings.py:109} WARNING - /opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py:1942: DeprecationWarning: This method is deprecated. Please use `BigQueryHook.insert_job` method.
  warnings.warn(

[2022-06-05, 06:23:05 UTC] {taskinstance.py:1702} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1330, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1457, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1513, in _execute_task
    result = execute_callable(context=context)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/transfers/bigquery_to_gcs.py", line 141, in execute
    job = hook.get_job(job_id=job_id).to_api_repr()
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/common/hooks/base_google.py", line 439, in inner_wrapper
    return func(self, *args, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 1492, in get_job
    job = client.get_job(job_id=job_id, project=project_id, location=location)
  File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 2066, in get_job
    resource = self._call_api(
  File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 782, in _call_api
    return call()
  File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
    return retry_target(
  File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/retry.py", line 190, in retry_target
    return target()
  File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 494, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.NotFound: 404 GET https://bigquery.googleapis.com/bigquery/v2/projects/<project_id>/jobs/<job_id>?projection=full&prettyPrint=false: Not found: Job <project_id>:<job_id>
[2022-06-05, 06:23:05 UTC] {standard_task_runner.py:89} ERROR - Failed to execute job 50257 for task BQ_TO_GCS
Traceback (most recent call last):
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
    args.func(args, dag=self.dag)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/utils/cli.py", line 94, in wrapper
    return f(*args, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 304, in task_run
    _run_task_by_selected_method(args, dag, ti)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 109, in _run_task_by_selected_method
    _run_raw_task(args, ti)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 182, in _run_raw_task
    ti._run_raw_task(
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1330, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1457, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1513, in _execute_task
    result = execute_callable(context=context)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/transfers/bigquery_to_gcs.py", line 141, in execute
    job = hook.get_job(job_id=job_id).to_api_repr()
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/common/hooks/base_google.py", line 439, in inner_wrapper
    return func(self, *args, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 1492, in get_job
    job = client.get_job(job_id=job_id, project=project_id, location=location)
  File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 2066, in get_job
    resource = self._call_api(
  File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 782, in _call_api
    return call()
  File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
    return retry_target(
  File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/retry.py", line 190, in retry_target
    return target()
  File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 494, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.NotFound: 404 GET https://bigquery.googleapis.com/bigquery/v2/projects/<project_id>/jobs/<job_id>?projection=full&prettyPrint=false: Not found: Job <project_id>:<job_id>

I am also not able to install the package apache-airflow-providers-google==2022.5.18+composer locally as pip is not able to locate this, nor am I able to see it in the releases.

What you think should happen instead

The task should be completed with a success status given the query is all right and executes.

How to reproduce

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@adityaprakash-bobby adityaprakash-bobby added area:providers kind:bug This is a clearly a bug labels Jun 10, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 10, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@adityaprakash-bobby
Copy link
Author

adityaprakash-bobby commented Jun 10, 2022

Thank you @gmyrianthous . This helps. In our case we are not managing the libraries in composer. We are letting composer manage the libraries. Hope this is brought into composer 2.0.14, else we might have to explicitly mention this and provide extra care while upgrading our composer environment.

EDIT: We upgraded from 2.0.8 to 2.0.14 since 2.0.8 was having some known issues that would be fixed end of this quarter and is affecting our production.

@gmyrianthous
Copy link

gmyrianthous commented Jun 10, 2022

@adityaprakash-bobby This isn't fixed in composer 2.0.14 (actually no composer version that makes use of apache-airflow-providers-google==7.0.0 will work) since no fix has been merged yet.
For the time being, you can follow one of the workarounds mentioned here.

@eladkal
Copy link
Contributor

eladkal commented Jun 10, 2022

The fix is in install apache-airflow-providers-google==8.0.0rc2
#24330
Please test it and report in #24289 if the fix doesn't work

@eladkal eladkal closed this as completed Jun 10, 2022
@eladkal eladkal added the duplicate Issue that is duplicated label Jun 10, 2022
@adityaprakash-bobby
Copy link
Author

@eladkal Is it wise to override the composer managed airflow python packages? I see composer 2.0.14 till the latest 2.0.16, has the apache-airflow-providers-google==2022.5.18+composer package.

@eladkal
Copy link
Contributor

eladkal commented Jun 11, 2022

That is a question for Composer support.

@adityaprakash-bobby
Copy link
Author

Sure, have let it open for support with Composer. Thanks Elad!

@potiuk potiuk reopened this Jun 14, 2022
@lihan
Copy link
Contributor

lihan commented Jun 15, 2022

Related
#24461

@potiuk
Copy link
Member

potiuk commented Jun 19, 2022

Closind with #24461

@potiuk potiuk closed this as completed Jun 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers duplicate Issue that is duplicated kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

5 participants