Skip to content

GlueJobOperator throws error after migration to newest version of Airflow #29423

@vgutkovsk

Description

@vgutkovsk

Apache Airflow version

2.5.1

What happened

We were using GlueJobOperator with Airflow 2.3.3 (official docker image) and it was working well, we didn't specify script file location, because it was inferred from the job name. After migration to 2.5.1 (official docker image) the operator fails if s3_bucket and script_location are not specified. That's the error I see:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 146, in execute
    glue_job_run = glue_job.initialize_job(self.script_args, self.run_job_kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 155, in initialize_job
    job_name = self.create_or_update_glue_job()
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 300, in create_or_update_glue_job
    config = self.create_glue_job_config()
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 97, in create_glue_job_config
    raise ValueError("Could not initialize glue job, error: Specify Parameter `s3_bucket`")
ValueError: Could not initialize glue job, error: Specify Parameter `s3_bucket`

What you think should happen instead

I was expecting that after migration the operator would work the same way.

How to reproduce

Create a dag with GlueJobOperator operator and do not use s3_bucket or script_location arguments

Operating System

Linux

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==7.1.0

Deployment

Docker-Compose

Deployment details

apache/airflow:2.5.1-python3.10 Docker image and official docker compose

Anything else

I believe it was commit #27893 by @romibuzi that introduced this behaviour.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions