Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubernetesExecutor multi_namespace_mode can use namespace list to avoid requiring cluster role #28047

Merged

Conversation

XD-DENG
Copy link
Member

@XD-DENG XD-DENG commented Dec 2, 2022

Currently KubernetesExecutor's multi_namespace_mode requires the Scheduler to have cluster-scope role on the Kubernetes Cluster, because it's using function list_pod_for_all_namespaces().

However, in certain enterprise environments, it's not possible for users to have cluster-scope role. For example, they may only get permissions in a namespace, rather on the whole cluster. Always allowing the Scheduler pod to have cluster-scope role is not a good from security aspect either.

This change aims to make KubernetesExecutor's multi_namespace_mode work without cluster-scope role.

(This was discussed at the mail list at https://lists.apache.org/thread/xxsppw7qwvky78l6nx41vlz593gj4zqb)

I'm sure folks would have suggestions and we need to future refine this change, but I would like to bring up the discussion by creating this PR first.

UPDATE:
Advantages this change brings:

  • Better fits enterprise environment
  • Better security: limit the permissions that the Scheduler Pod needs, so that it doesn't have too much permissions which it doesn't have to have (earlier it has to have a cluster role in order to use multi_namespace_mode)

@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes provider related issues area:Scheduler including HA (high availability) scheduler labels Dec 2, 2022
@XD-DENG XD-DENG requested review from ferruzzi and potiuk December 2, 2022 03:04
@XD-DENG
Copy link
Member Author

XD-DENG commented Dec 2, 2022

Hi @potiuk and @ferruzzi tagging you both, to follow up our earlier discussion in the mail list.

@XD-DENG XD-DENG force-pushed the external_k8s-executor-for-enterprise-k8s-env branch 2 times, most recently from ae1a487 to ef75732 Compare December 2, 2022 05:22
Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the comments already made, LGTM

@XD-DENG XD-DENG force-pushed the external_k8s-executor-for-enterprise-k8s-env branch 2 times, most recently from d75171b to 1c1f25d Compare December 3, 2022 00:52
@XD-DENG
Copy link
Member Author

XD-DENG commented Dec 3, 2022

Hi @dstandish , clarified/addressed your earlier comments. Pls help take another look when you get time? Thanks a lot!

airflow/executors/kubernetes_executor.py Outdated Show resolved Hide resolved
airflow/kubernetes/kube_config.py Show resolved Hide resolved
airflow/executors/kubernetes_executor.py Show resolved Hide resolved
airflow/executors/kubernetes_executor.py Show resolved Hide resolved
airflow/executors/kubernetes_executor.py Show resolved Hide resolved
airflow/executors/kubernetes_executor.py Show resolved Hide resolved
airflow/executors/kubernetes_executor.py Outdated Show resolved Hide resolved
@dstandish dstandish changed the title Ensure KubernetesExecutor's multi_namespace_mode work without cluster-scope role KubernetesExecutor multi_namespace_mode can use namespace list Dec 3, 2022
@dstandish
Copy link
Contributor

dstandish commented Dec 8, 2022

i just noticed something coincidentally... you may want to look at file task handler. when reading logs from k8s task, seems to assume single namespace

@XD-DENG
Copy link
Member Author

XD-DENG commented Dec 8, 2022

i just noticed something coincidentally... you may want to look at file task handler. when reading logs from k8s task, seems to assume single namespace

Thanks @dstandish , yes, I have already noticed that earlier. I plan to address that separately later though if by then other folks haven't taken it up. That part was totally missed when this multi_namespace_mode was introduced (again, questioning how people are using this feature so far)

@dstandish
Copy link
Contributor

dstandish commented Dec 8, 2022

i just noticed something coincidentally... you may want to look at file task handler. when reading logs from k8s task, seems to assume single namespace

Thanks @dstandish , yes, I have already noticed that earlier. I plan to address that separately later though if by then other folks haven't taken it up. That part was totally missed when this multi_namespace_mode was introduced (again, questioning how people are using this feature so far)

sounds good

@XD-DENG
Copy link
Member Author

XD-DENG commented Dec 8, 2022

Hi @dstandish , I tried to address your earlier comments (very nice inputs!), either changes have been made or I have clarified why I would prefer to do differently. Please let me know your thoughts.

(Sorry for getting back to you a bit slowly. Only have some time after work for this)

@dstandish
Copy link
Contributor

there is nothing slow about it at all... lemme look

Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. maybe worth getting second set of eyes e.g. from @jedcunningham or @uranusjr

Comment on lines -69 to +71
resource_version = "0"
resource_version: dict[str, str] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably improve this class further, but that can be a separate PR.

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t think everything entirely through but this should be good.

@XD-DENG
Copy link
Member Author

XD-DENG commented Dec 8, 2022

Thanks both @dstandish @uranusjr ! The whole PR has improved a lot through the review.

May I also get some inputs from you about the failing test? The log isn't helping much.
Is there any known issue of the test pipeline, or just something missed/wrong here? Thanks a lot

@potiuk
Copy link
Member

potiuk commented Dec 8, 2022

This looks like it's coming from one of your changes @XD-DENG - it's hard to say what precisely it is - because one of your tests is likely exhausting all 64 GB of memory that the builders have (that's what error 137 is) and it happens very consistently on all your tests - and it does not seem to be happening on others.

You can easily reproduce those builds locally with Breeze and see what happens https://github.com/apache/airflow/blob/main/TESTING.rst#running-tests-using-breeze-from-the-host - seems like "Core" test type is failing. The main reason for having breeze is that you should be able to reproduce it locally.

If not the memory exhaution you'd also get the instrunctions how to do it - but basically looks like some of the core tests fail and you could easily attempt to reproduce it the hash is in the CI logs:

This will get you to the shell of the image that was used in the last CI build here:

breeze shell --image-tag b64f96bc9e0cb67cd7b75baad3933638c23e4935 

And you should be able to run the tests there.

@potiuk
Copy link
Member

potiuk commented Dec 9, 2022

Yep @XD-DENG - exactly as I expected:

At some point in time those tests start eating memory at the rate of > 1GB / 10 seconds. Just before the crash we have just 800 MB left, Something spins out of control and causes that - this only happens on your change, not all the other PRs nor main - so it MUST be something here that triggers it. Now we just need to find out what it might be:

image

@potiuk
Copy link
Member

potiuk commented Dec 9, 2022

My guess is - this is problem with cleanup. You have a LOT of parameterized tests and they are not cleaned up after each test and some of the left-overs (threads, processes etc. keep on running after each test).

@XD-DENG
Copy link
Member Author

XD-DENG commented Dec 9, 2022

Thanks a lot @potiuk for helping check and sharing the Breeze/CI tips!
I will do another check later today.

@XD-DENG
Copy link
Member Author

XD-DENG commented Dec 10, 2022

Thanks again @potiuk . The hint you pointed out was very useful. After adding cf8ce59, the CI is finally running successfully! (we just need to end the executor explicitly in the tests).

There are three items we would like to further check later:

  • File Task Handler cannot read log properly from K8S tasks when it's multi_namespace_mode (it's designed in a way that it only handled one K8S namespace). It's a separate issue from this PR though.
  • @uranusjr suggested to further refactor ResourceVersion class, but agreed it can be done later as a separate PR.
  • @dstandish suggested we can consider using threads to manage multiple watchers.

But for now I believe we are good to go ahead and merge this PR itself.

I would like to have @dstandish and @potiuk as co-authors for your significant contribution to this PR, if you have no objection :-)

@XD-DENG XD-DENG merged commit c739a6a into apache:main Dec 10, 2022
@XD-DENG XD-DENG changed the title KubernetesExecutor multi_namespace_mode can use namespace list KubernetesExecutor multi_namespace_mode can use namespace list to avoid requiring cluster role Dec 10, 2022
@XD-DENG XD-DENG deleted the external_k8s-executor-for-enterprise-k8s-env branch December 10, 2022 06:51
@XD-DENG XD-DENG added this to the Airflow 2.6.0 milestone Dec 10, 2022
@potiuk
Copy link
Member

potiuk commented Dec 10, 2022

Actually I think we need a bit more protection, otherwise the same situation happens if for any reason those asserts will start to raise exceptions. try/finally and making sure that we always .end() after we .start() acrosss all the k8s tests is a much more robust solution.

Follow up here #28281 @XD-DENG

potiuk added a commit to potiuk/airflow that referenced this pull request Dec 10, 2022
As a follow up after apache#28047, this PR will make the test cleanup
more robust and resilient to any errors that might have caused
kubernetes_executors left behind.

wrapping start()/end() in try/finally will make the tests
completely resilient to cases where the asserts start to fail -
without those, any failure in tests would cause the same resource
leakage as we initially had when #28407 was iterated on.
potiuk added a commit that referenced this pull request Dec 10, 2022
As a follow up after #28047, this PR will make the test cleanup
more robust and resilient to any errors that might have caused
kubernetes_executors left behind.

wrapping start()/end() in try/finally will make the tests
completely resilient to cases where the asserts start to fail -
without those, any failure in tests would cause the same resource
leakage as we initially had when #28407 was iterated on.
@XD-DENG
Copy link
Member Author

XD-DENG commented Dec 15, 2022

Hi @dstandish , for the File Task Handler issue we discussed earlier, I'm preparing the fix and should be ready within this week. FYI

potiuk added a commit to potiuk/airflow that referenced this pull request Dec 19, 2022
After apache#28047 the test_recover_from_resource_too_old started to
fail in a flaky way. Turned out that - depend on some other test
run the Singleton ResourceVersion could containt not one but
two namespaces (including default namespace).

Also while fixing the tests it's been noticed that the test
missed an assert - it did not assert that the Exception was in fact
thrown, so the test could have succeeded even if the exception was
not really thrown (there was assert in "except" clause but if the
exception was not thrown, it would not have been called at all).
potiuk added a commit that referenced this pull request Dec 19, 2022
After #28047 the test_recover_from_resource_too_old started to
fail in a flaky way. Turned out that - depend on some other test
run the Singleton ResourceVersion could containt not one but
two namespaces (including default namespace).

Also while fixing the tests it's been noticed that the test
missed an assert - it did not assert that the Exception was in fact
thrown, so the test could have succeeded even if the exception was
not really thrown (there was assert in "except" clause but if the
exception was not thrown, it would not have been called at all).
pierrejeambrun pushed a commit that referenced this pull request Mar 6, 2023
As a follow up after #28047, this PR will make the test cleanup
more robust and resilient to any errors that might have caused
kubernetes_executors left behind.

wrapping start()/end() in try/finally will make the tests
completely resilient to cases where the asserts start to fail -
without those, any failure in tests would cause the same resource
leakage as we initially had when #28407 was iterated on.

(cherry picked from commit 3b203bc)
pierrejeambrun pushed a commit that referenced this pull request Mar 8, 2023
As a follow up after #28047, this PR will make the test cleanup
more robust and resilient to any errors that might have caused
kubernetes_executors left behind.

wrapping start()/end() in try/finally will make the tests
completely resilient to cases where the asserts start to fail -
without those, any failure in tests would cause the same resource
leakage as we initially had when #28407 was iterated on.

(cherry picked from commit 3b203bc)
@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler debug ci resources Set it on PR if you want to debug resource usage for it provider:cncf-kubernetes Kubernetes provider related issues type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants