Skip to content

Impersonation logic missing in BigQuery Async operators #34727

@hkc-8010

Description

@hkc-8010

Apache Airflow version

main (development)

What happened

BigQueryAsyncOperators uses writing credentials to a file using GoogleBaseHook.provide_gcp_credential_file_as_context() which has logic for key_path and keyfile_dict but logic for impersonation_chain method is missing.

When using Impersonation chain method, the Operator goes into the deferred state and the tasks fail with 403 Access Denied error.

What you think should happen instead

When the operator goes into deferred state, the triggerer should try generating credentials using an impersonated service account instead of the default service account.the

How to reproduce

We have set up an impersonation chain for authentication to BigQuery. Here's how it works:

We assign a Service Account to the Kubernetes namespace.
This namespace-level Service Account impersonates our team's Service Account, which has the necessary roles to access BigQuery.
When the operator runs, the worker inserts a job into BigQuery using the team's Service Account. After that, it defers itself and starts executing get_job to check the job's status.

However, during this process, we encountered a "403 Access Denied" error. After some debugging, we discovered that the Triggerer is checking the job status using the namespace-level Service Account, rather than the team's Service Account. To confirm this, we granted the necessary role to the namespace-level Service Account for checking job status, and after that, the task succeeded.

To setup impersonation_chain, we can refer to this documentation.

Operating System

Debian GNU/Linux 11 (bullseye)

Versions of Apache Airflow Providers

apache-airflow-providers-google==10.9.0

Deployment

Astronomer

Deployment details

Deploy Airflow on Kubernetes so that you can annotate gcloud caller service account to the Airflow worker service account and impersonate a privileged service account that has bigquery permissions to generate short-term credentials.

Anything else

This problem occurs everytime when you use bigquery operators in async mode.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions