Skip to content

Conversation

@jhongy1994
Copy link
Contributor

@jhongy1994 jhongy1994 commented Jul 21, 2024

This PR fixes XCOM_sidecar_container not started results in long running DAG
Closes: #38115

The following operators are affected:

  • KubernetesPodOperator

Although startup_timeout_seconds is the timeout for the pod start, there is no parameter to set the timeout for the sidecar container in KubernetesPodOperator. Therefore, I have used startup_timeout_seconds for xcom sidecar container too.

Instead of above approach, I have also considered the following alternatives:

  • Adding a new parameter to KubernetesPodOperator to set the timeout for the XCOM container.
  • Raising an Exception if attempt > 1 instead of using a timeout.

If you have any better suggestions, please let me know.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Jul 21, 2024
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 21, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@jhongy1994 jhongy1994 force-pushed the fix/wait-forever-XCOM_sidecar_container-start branch 4 times, most recently from 8a89262 to 9ee7042 Compare July 25, 2024 15:28
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhongy1994 good work on your first PR. A few review comments

Comment on lines +740 to +743
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just dump the events as well? That would be a better indicator than users manually checking in k8s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add test cases for this change? Like basically call this, sleep for certain seconds and then check if the exception came up

@romsharon98
Copy link
Contributor

I agree with your choice to use startup_timeout_seconds 😄

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small nit, LGTM

@jhongy1994
Copy link
Contributor Author

@hussein-awala Thank you for your review!
I've applied the changes you suggested.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 19, 2024
@injae-kim
Copy link

gentle ping to reviewers @amoghrajesh, @romsharon98, @hussein-awala 😃can we merge this PR?

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 24, 2024
@eladkal eladkal force-pushed the fix/wait-forever-XCOM_sidecar_container-start branch from 5a2e1b5 to 0fee52f Compare September 25, 2024 23:27
@eladkal
Copy link
Contributor

eladkal commented Oct 21, 2024

@jhongy1994 can you rebase and resolve conflicts?

@jhongy1994
Copy link
Contributor Author

@eladkal thanks for your feedback!
I see that the issue has been resolved through another PR(#42504). I'll close this one to avoid duplication. Looking forward to contributing more in the future!

@jhongy1994 jhongy1994 closed this Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XCOM_sidecar_container not started results in long running DAG

6 participants