Handle CrawlerRunningException gracefully in GlueCrawlerOperator#62016
Handle CrawlerRunningException gracefully in GlueCrawlerOperator#62016vincbeck merged 4 commits intoapache:mainfrom
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
24bf2b4 to
370736f
Compare
|
Please fix static checks |
370736f to
fb398f4
Compare
vincbeck
left a comment
There was a problem hiding this comment.
It kinds of makes sense to receive this exception no? You're trying to update or a start a new job but it fails to do so because there is already one running. What do you think?
Good point! The exception is indeed telling us something meaningful — but I'd argue the correct response is to wait for the existing run, not to fail the task. The most common cause in production is retry-induced race conditions: boto3's built-in retry fires Since the operator already supports For An alternative design: we could add a |
I would personally prefer this solution so that users can decide which behavior they want |
When start_crawler() or update_crawler() is called while the crawler is already running (e.g., from a retry, overlapping DAG run, or boto3 internal retry after a timeout), the Glue API raises CrawlerRunningException. Previously this propagated as an unhandled error, causing Airflow task failure despite the crawler actually succeeding. This change catches CrawlerRunningException on both update_crawler() and start_crawler() calls, logs a warning, and waits for the existing run to complete instead of failing.
973cde4 to
50a2a37
Compare
Agreed Now:
Added tests covering both modes. Let me know if you'd like any further adjustments |
providers/amazon/src/airflow/providers/amazon/aws/operators/glue_crawler.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/glue_crawler.py
Outdated
Show resolved
Hide resolved
81e0141 to
c2b2763
Compare
2c067cc to
b0cc60a
Compare
|
@vincbeck Fixed ruff failure and pushed again |
…ue_crawler.py Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
9fe246b to
2419757
Compare
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
…che#62016) * Handle CrawlerRunningException in GlueCrawlerOperator When start_crawler() or update_crawler() is called while the crawler is already running (e.g., from a retry, overlapping DAG run, or boto3 internal retry after a timeout), the Glue API raises CrawlerRunningException. Previously this propagated as an unhandled error, causing Airflow task failure despite the crawler actually succeeding. This change catches CrawlerRunningException on both update_crawler() and start_crawler() calls, logs a warning, and waits for the existing run to complete instead of failing. * Update providers/amazon/src/airflow/providers/amazon/aws/operators/glue_crawler.py Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com> --------- Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
What
Handle
CrawlerRunningExceptioninGlueCrawlerOperator.execute()instead of letting it fail the Airflow task.Why
When
start_crawler()orupdate_crawler()is called while the crawler is already running (e.g., from a retry, overlapping DAG run, or boto3 internal retry after a timeout), the AWS Glue API raisesCrawlerRunningException. Currently this propagates as an unhandledClientError, causing the Airflow task to fail even though the crawler run completes successfully.This is a common issue in production: the Glue console shows the crawler succeeded, but Airflow marks the task as failed and triggers alerts.
What Changed
providers/amazon/src/airflow/providers/amazon/aws/operators/glue_crawler.pyupdate_crawler()with try/except: catchesCrawlerRunningExceptionand logs a warning (skips the update since the crawler is busy).start_crawler()with try/except: catchesCrawlerRunningExceptionand logs a warning (waits for the existing run instead of failing).ClientErrorcodes are re-raised as before.from botocore.exceptions import ClientErrorimport.providers/amazon/tests/unit/amazon/aws/operators/test_glue_crawler.pytest_execute_crawler_running_on_start: verifiesCrawlerRunningExceptiononstart_crawleris caught and the operator waits for the existing run.test_execute_crawler_running_on_update: verifiesCrawlerRunningExceptiononupdate_crawleris caught andstart_crawleris still called.test_execute_other_client_error_on_start_raises: verifies non-CrawlerRunningExceptionerrors onstart_crawlerpropagate.test_execute_other_client_error_on_update_raises: verifies non-CrawlerRunningExceptionerrors onupdate_crawlerpropagate.How to Test