Handle CrawlerRunningException gracefully in GlueCrawlerOperator by bahram-cdt · Pull Request #62016 · apache/airflow

bahram-cdt · 2026-02-16T13:58:08Z

What

Handle CrawlerRunningException in GlueCrawlerOperator.execute() instead of letting it fail the Airflow task.

Why

When start_crawler() or update_crawler() is called while the crawler is already running (e.g., from a retry, overlapping DAG run, or boto3 internal retry after a timeout), the AWS Glue API raises CrawlerRunningException. Currently this propagates as an unhandled ClientError, causing the Airflow task to fail even though the crawler run completes successfully.

This is a common issue in production: the Glue console shows the crawler succeeded, but Airflow marks the task as failed and triggers alerts.

What Changed

providers/amazon/src/airflow/providers/amazon/aws/operators/glue_crawler.py

Wrapped update_crawler() with try/except: catches CrawlerRunningException and logs a warning (skips the update since the crawler is busy).
Wrapped start_crawler() with try/except: catches CrawlerRunningException and logs a warning (waits for the existing run instead of failing).
All other ClientError codes are re-raised as before.
Added from botocore.exceptions import ClientError import.

providers/amazon/tests/unit/amazon/aws/operators/test_glue_crawler.py

test_execute_crawler_running_on_start: verifies CrawlerRunningException on start_crawler is caught and the operator waits for the existing run.
test_execute_crawler_running_on_update: verifies CrawlerRunningException on update_crawler is caught and start_crawler is still called.
test_execute_other_client_error_on_start_raises: verifies non-CrawlerRunningException errors on start_crawler propagate.
test_execute_other_client_error_on_update_raises: verifies non-CrawlerRunningException errors on update_crawler propagate.

How to Test

# Simulate CrawlerRunningException
from botocore.exceptions import ClientError
error = ClientError(
    error_response={"Error": {"Code": "CrawlerRunningException", "Message": "Already running"}},
    operation_name="StartCrawler",
)
# Previously: operator.execute() raises ClientError -> task fails
# Now: operator catches it, logs warning, waits for existing run -> task succeeds

boring-cyborg · 2026-02-16T13:58:11Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://s.apache.org/airflow-slack

vincbeck · 2026-02-16T16:36:48Z

Please fix static checks

vincbeck

It kinds of makes sense to receive this exception no? You're trying to update or a start a new job but it fails to do so because there is already one running. What do you think?

bahram-cdt · 2026-02-16T17:24:38Z

It kinds of makes sense to receive this exception no? You're trying to update or a start a new job but it fails to do so because there is already one running. What do you think?

Good point! The exception is indeed telling us something meaningful — but I'd argue the correct response is to wait for the existing run, not to fail the task.

The most common cause in production is retry-induced race conditions: boto3's built-in retry fires start_crawler() a second time after a network timeout on the first (successful) call. The user didn't do anything wrong, and the crawler will complete successfully, but the Airflow task fails and triggers false alerts.

Since the operator already supports wait_for_completion, the natural behavior when a crawler is already running is to wait for it — the end state is identical to starting a fresh run and waiting.

For update_crawler, I agree the case is slightly weaker (we're skipping a config update), but the config rarely changes between runs, and the next successful run will pick it up. Failing the whole task seems disproportionate.

An alternative design: we could add a fail_on_already_running: bool = False parameter to make this opt-in, if the team prefers a non-breaking-default approach. Happy to adjust!

vincbeck · 2026-02-16T17:47:35Z

An alternative design: we could add a fail_on_already_running: bool = False parameter to make this opt-in, if the team prefers a non-breaking-default approach. Happy to adjust!

I would personally prefer this solution so that users can decide which behavior they want

When start_crawler() or update_crawler() is called while the crawler is already running (e.g., from a retry, overlapping DAG run, or boto3 internal retry after a timeout), the Glue API raises CrawlerRunningException. Previously this propagated as an unhandled error, causing Airflow task failure despite the crawler actually succeeding. This change catches CrawlerRunningException on both update_crawler() and start_crawler() calls, logs a warning, and waits for the existing run to complete instead of failing.

bahram-cdt · 2026-02-16T19:00:42Z

An alternative design: we could add a fail_on_already_running: bool = False parameter to make this opt-in, if the team prefers a non-breaking-default approach. Happy to adjust!

I would personally prefer this solution so that users can decide which behavior they want

Agreed Now:

fail_on_already_running=True (default) — preserves current behavior, no change for existing users
fail_on_already_running=False — opt-in: catches CrawlerRunningException, logs a warning, and waits for the existing run to complete

Added tests covering both modes. Let me know if you'd like any further adjustments

providers/amazon/src/airflow/providers/amazon/aws/operators/glue_crawler.py

bahram-cdt · 2026-02-16T22:33:55Z

@vincbeck Fixed ruff failure and pushed again

…ue_crawler.py Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>

boring-cyborg · 2026-02-17T16:29:55Z

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

…che#62016) * Handle CrawlerRunningException in GlueCrawlerOperator When start_crawler() or update_crawler() is called while the crawler is already running (e.g., from a retry, overlapping DAG run, or boto3 internal retry after a timeout), the Glue API raises CrawlerRunningException. Previously this propagated as an unhandled error, causing Airflow task failure despite the crawler actually succeeding. This change catches CrawlerRunningException on both update_crawler() and start_crawler() calls, logs a warning, and waits for the existing run to complete instead of failing. * Update providers/amazon/src/airflow/providers/amazon/aws/operators/glue_crawler.py Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com> --------- Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>

bahram-cdt requested a review from o-nikolas as a code owner February 16, 2026 13:58

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Feb 16, 2026

bahram-cdt force-pushed the fix/handle-crawler-running-exception branch from 24bf2b4 to 370736f Compare February 16, 2026 15:19

eladkal requested a review from vincbeck February 16, 2026 15:20

bahram-cdt force-pushed the fix/handle-crawler-running-exception branch from 370736f to fb398f4 Compare February 16, 2026 17:09

vincbeck reviewed Feb 16, 2026

View reviewed changes

bahram-cdt force-pushed the fix/handle-crawler-running-exception branch from 973cde4 to 50a2a37 Compare February 16, 2026 18:58

Merge branch 'main' into fix/handle-crawler-running-exception

0652c38

vincbeck reviewed Feb 16, 2026

View reviewed changes

providers/amazon/src/airflow/providers/amazon/aws/operators/glue_crawler.py Outdated Show resolved Hide resolved

providers/amazon/src/airflow/providers/amazon/aws/operators/glue_crawler.py Outdated Show resolved Hide resolved

bahram-cdt force-pushed the fix/handle-crawler-running-exception branch from 81e0141 to c2b2763 Compare February 16, 2026 20:24

vincbeck approved these changes Feb 16, 2026

View reviewed changes

bahram-cdt force-pushed the fix/handle-crawler-running-exception branch from 2c067cc to b0cc60a Compare February 16, 2026 22:31

Update providers/amazon/src/airflow/providers/amazon/aws/operators/gl…

2419757

…ue_crawler.py Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>

bahram-cdt force-pushed the fix/handle-crawler-running-exception branch from 9fe246b to 2419757 Compare February 17, 2026 07:22

Merge branch 'main' into fix/handle-crawler-running-exception

7bdb2c2

vincbeck merged commit e9b05f9 into apache:main Feb 17, 2026
89 checks passed

potiuk mentioned this pull request Feb 26, 2026

Status of testing Providers that were prepared on February 26, 2026 #62537

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle CrawlerRunningException gracefully in GlueCrawlerOperator#62016

Handle CrawlerRunningException gracefully in GlueCrawlerOperator#62016
vincbeck merged 4 commits intoapache:mainfrom
bahram-cdt:fix/handle-crawler-running-exception

bahram-cdt commented Feb 16, 2026 •

edited

Loading

Uh oh!

boring-cyborg bot commented Feb 16, 2026

Uh oh!

vincbeck commented Feb 16, 2026

Uh oh!

vincbeck left a comment

Uh oh!

bahram-cdt commented Feb 16, 2026

Uh oh!

vincbeck commented Feb 16, 2026

Uh oh!

bahram-cdt commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

bahram-cdt commented Feb 16, 2026

Uh oh!

Uh oh!

boring-cyborg bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bahram-cdt commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

What Changed

How to Test

Uh oh!

boring-cyborg bot commented Feb 16, 2026

Uh oh!

vincbeck commented Feb 16, 2026

Uh oh!

vincbeck left a comment

Choose a reason for hiding this comment

Uh oh!

bahram-cdt commented Feb 16, 2026

Uh oh!

vincbeck commented Feb 16, 2026

Uh oh!

bahram-cdt commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

bahram-cdt commented Feb 16, 2026

Uh oh!

Uh oh!

boring-cyborg bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bahram-cdt commented Feb 16, 2026 •

edited

Loading