Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented May 22, 2025

The celery tests hang intermittently from time to time, and it's rather difficult to pin-point the root cause. This PR attempts to isolate the tests and make them fail faster in case the problem happens.

Currently, after some recent refactoring - none of the tests usually run longer that 18-19 minutes, so we can set much lower timeouts for the test job - 30 minutes "soft" timeout (SIGTERM sent to stop the container and dump logs) and 35 minutes for "hard" failure of GitHub Action.

If we see that we are still hanging despite the isolation, we can later introduce more debug logging for just the celery container run.
(cherry picked from commit fb8c877)


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

The celery tests hang intermittently from time to time, and it's
rather difficult to pin-point the root cause. This PR attempts to
isolate the tests and make them fail faster in case the problem
happens.

Currently, after some recent refactoring - none of the tests usually
run longer that 18-19 minutes, so we can set much lower timeouts
for the test job - 30 minutes "soft" timeout (SIGTERM sent to
stop the container and dump logs) and 35 minutes for "hard" failure
of GitHub Action.

If we see that we are still hanging despite the isolation, we
can later introduce more debug logging for **just** the celery
container run.
(cherry picked from commit fb8c877)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
@potiuk
Copy link
Member Author

potiuk commented May 22, 2025

Cherry-picking just in case (will be easier to cherry-pick other changes in selective checks).

@potiuk potiuk merged commit 9f3114b into apache:v3-0-test May 22, 2025
9 checks passed
@potiuk potiuk deleted the backport-fb8c877-v3-0-test branch May 22, 2025 12:50
kaxil pushed a commit that referenced this pull request Jun 3, 2025
The celery tests hang intermittently from time to time, and it's
rather difficult to pin-point the root cause. This PR attempts to
isolate the tests and make them fail faster in case the problem
happens.

Currently, after some recent refactoring - none of the tests usually
run longer that 18-19 minutes, so we can set much lower timeouts
for the test job - 30 minutes "soft" timeout (SIGTERM sent to
stop the container and dump logs) and 35 minutes for "hard" failure
of GitHub Action.

If we see that we are still hanging despite the isolation, we
can later introduce more debug logging for **just** the celery
container run.
(cherry picked from commit fb8c877)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant