-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version
2.7.3
What happened
I'm running a DAG like:
with DAG(
dag_id="my_dag",
default_args={
"depends_on_past": False,
"email_on_failure": False,
"email_on_retry": False,
"retries": 0,
"retry_delay": timedelta(minutes=5)
},
start_date=datetime(2023, 12, 6),
catchup=False
) as dag:
start = EmptyOperator(task_id="start")
etl_scripts_bucket = "my-etl-scripts"
create_etl_script_bucket = S3CreateBucketOperator(
task_id="create_bucket",
bucket_name=etl_scripts_bucket
)
etl_job = GlueJobOperator(
task_id="get_stuff",
job_name="get_stuff",
script_location="s3://my-airflow-bucket/dags/my_dag/test_etl.py",
s3_bucket=etl_scripts_bucket,
iam_role_arn="a-valid-iam-role-arn" # Consider this line
create_job_kwargs={"GlueVersion": "4.0", "NumberOfWorkers": 2, "WorkerType": "G.1X"},
)
end = EmptyOperator(task_id="end")
start >> create_etl_script_bucket >> etl_job >> end
If a-valid-iam-role-arn is indeed valid, the DAG runs fine and it created a Glue job with the name get_stuff. Now, suppose that iam_role_arn changes, maybe I even break it on purpose, e.g., replace the line:
iam_role_arn="BREAK_ON_PURPOSE" # Consider this line
I would definitely expect the DAG to update the Glue job get_stuff or at least try to do so - if the role is completely bogus maybe it should break loudly or if the role ARN is valid but has insufficient permissions or whatever, the change should go through. After committing the change, re-running the DAG succeeds. Inspecting the Glue job from the AWS console still shows the IAM role to be a-valid-iam-role-arn, i.e., the job did not update.
So while the DAG code does get updated, the Glue job does not. I observe the same effect with a completely bogus IAM role ARN (i.e., invalid ARN), an existing but semantically wrong ARN, and also with a correct, desired ARN that does exist and has the permissions to run whatever is needed.
What you think should happen instead
The Glue job should update with the new IAM role ARN.
How to reproduce
Have a valid default aws_default connection. Use the following DAG:
from datetime import datetime
from airflow import DAG
from airflow.operators.empty import EmptyOperator
from airflow.providers.amazon.aws.operators.glue import GlueJobOperator
from airflow.providers.amazon.aws.operators.s3 import S3CreateBucketOperator
with DAG(
dag_id="bug_dag",
start_date=datetime(2023, 12, 6),
catchup=False
) as dag:
start = EmptyOperator(task_id="start")
etl_scripts_bucket = "my-dag-get-etl-scripts"
create_etl_script_bucket = S3CreateBucketOperator(
task_id="create_bucket",
bucket_name=etl_scripts_bucket
)
# Perform these steps:
# 1. Manually create a bucket with any name, say BUCKET_NAME.
# 2. Add a file called "test_etl.py" with the contents: print('hello')
# 3. Create an IAM role or use an existing one. Doesn't matter what permissions it has. Let the role ARN be ROLE_ARN.
BUCKET_NAME = "REPLACE-ME"
ROLE_ARN = "REPLACE-ME"
etl_job = GlueJobOperator(
task_id="do_etl",
job_name="do_etl",
script_location=f"s3://{BUCKET_NAME}/test_etl.py",
s3_bucket=etl_scripts_bucket,
iam_role_arn=ROLE_ARN,
create_job_kwargs={"GlueVersion": "4.0", "NumberOfWorkers": 2, "WorkerType": "G.1X"},
)
end = EmptyOperator(task_id="end")
start >> create_etl_script_bucket >> etl_job >> end
Follow the three manual steps above (create bucket, add mock file, pick/create a new IAM role). Run the operator. Update the iam_role_arn. Run the operator. Notice now that the Glue job which has been created has not been updated.
Operating System
Amazon Linux 2023
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==8.11.0
Deployment
Virtualenv installation
Deployment details
No response
Anything else
Looking at the code, the issue seems to be within GlueJobHook. Maybe the culprit is the create_or_update_glue_job method which in turn uses create_glue_job_config. I can't, however, see that anything would obviously be wrong. This might be related to #27893
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct