-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block a Pod's IP packets until its NetworkPolicies are realized #5698
Conversation
6831d57
to
eb57279
Compare
67b4cb8
to
200b422
Compare
In the previous implementation, traffic from/to a Pod may bypass NetworkPolicies applied to the Pod in a short time window when the agent restarts because realizing NetworkPolicies and enabling forwarding are asynchronous. However, we can't wait for all NetworkPolicies to be realized before enabling forwarding of OVS because there are some cases the former depends on the latter, for example, when proxyAll is enabled, or when it's a Windows Node, in which cases control-plane communication relies on the forwarding of OVS. This patch takes a more fine-grained approach: block a Pod's IP packets in NetworkPolicy's entry tables until its NetworkPolicies are realized. This granularity leaves the Node and the hostNetwork Pods' traffic untouched and makes the realization issue of a Pod's NetworkPolicies affect the Pod's IP packets only. The following changes are made to implement the approach: 1. EgressSecurityClassifierTable is now always required. (Previously it's only required for ExternalNode, not K8sNode). 2. One flow with low priority dropping traffic from local Pods is installed in EgressSecurityClassifierTable, and one flow with low priority dropping traffic to local Pods is installed in IngressSecurityClassifierTable. 3. When a Pod's NetworkPolicies are fully realized the first time, one flow with normal priority allowing traffic from this Pod is installed in EgressSecurityClassifierTable to override the above drop action, one flow in IngressSecurityClassifierTable did the same for traffic to this Pod. Signed-off-by: Quan Tian <qtian@vmware.com>
200b422
to
f8f0271
Compare
I haven't reviewed the code in details yet, but I had one question.
Historically, we have claimed that Pod network setup has no dependency on the Antrea Controller. In other words, if the Antrea Controller is down, the Antrea Agent can still connect the Pod to the Pod network, and the Pod will have IP connectivity. Our architecture doc even has the following statement:
We still today present this as an advantage of the Antrea architecture, when we talk about the project. So I wanted to double check that this proposed change doesn't invalidate that statement. If the core idea still holds (Antrea Controller isn't required to provide connectivity, and NP realization is more of a local concept here), then I have no objection. If this PR changes that, maybe we should have a discussion about whether this should be configuration-based. |
The situation is the statement is already stale, the proposed change invalidates it further. But we can make some changes to achieve the statement again. Yes, from connectivity's perspective, antrea-controller is not a must-have (except AntreaIPAM mode) technically. However, when NetworkPolicy is in use, I think it's wrong to treat it as best-effort and declare the behavior that an already realized NetworkPolicy may be bypassed when the agent restarts is expected. Transient packet loss is a nature of networking and something applications can tolerate, but not degraded security. If this principle is agreed, then we have to make realization of NetworkPolicy a precondition of connectivity logically. Even though the statement is there, I feel there is no real use case that doesn't deploy antrea-controller, because the connection with antrea-controller has become a check of antrea-agent's readiness since v1.0.0 (by #1946). The agents wouldn't appear as ready if there is no antrea-controller. It's still possible to achieve what the statement describes, but first we need to make NetworkPolicy optional explicitly. User can choose to disable the whole NetworkPolicy function and not to deploy antrea-controller. Then antrea-agent doesn't initilize NetworkPolicyController and doesn't make it a check of readiness, and the flows to block IP packets in NetworkPolicy tables wouldn't be there. |
Yes, I do think that overall this is an improvement to the implementation. Can you remind me how we define "NetworkPolicies are realized"? Which set of NetworkPolicies are we considering in this case? Is there any specific guarantee or it just means that the Pod has been processed "once" by the Antrea Controller?
I tend to agree. IMO, the design was really more about having the CNI implementation (implementation of CNI ADD / DEL) independent of a centralized control-plane, not about the ability to deploy Antrea without the controller. We still have this today (CNI ADD is handled locally, and doesn't require communication with the controller), unlike ovn-kubernetes for example. Even then, it's unclear if there is any tangible benefit, besides a simpler architecture.
I don't know if it's worth it. We haven't had any user request for this, and as you said the coupling exists since Antrea v1.0. We should update the design doc though (and any other possible reference to this in the rest of the documentation). It's still a bit of a compromise between security / correctness, and robustness. This change could theoretically increase the blast radius of a bug in the controller. If a single-tenant K8s user doesn't have any NetworkPolicies, this change would still affect them in a way. A bug in the controller could mean the inability to schedule workloads. |
In this PR, a Pod's NetworkPolicies are realized when all rules applied to have been reconciled successfully. antrea/pkg/agent/controller/networkpolicy/status_controller.go Lines 127 to 136 in f8f0271
I feel an option to disable NetworkPolicy entirely makes more sense than an option to preserve the "best-effort" NetworkPolicy behavior as users will need to configure something anyway, and the reason why they should do it is, it's unnecessary to add a failure point when they don't really use any function provided by it. I should mention the implementation only blocks a Pod's IP packets when the agent never knows what NetworkPolicies have applied to the Pod (this is before the agent has fully synced with the controller) or when known NetworkPolicies haven't been realized succesfully (which hardly happen today). It doesn't require the desired NetworkPolicy antrea-agent has retrieved to be strictly up-to-date (and it's perhaps impossible). The logic is that we could realize NetworkPolicies based on a snapshot of a time point but can't based on an empty data. It essentially prevents a regression of security for Pods when the agent upgrades/restarts. |
I'm thinking, we could also have a local copy of the desired NetworkPolicies in host filesystem, which will be used as the source of NetworkPolicy before connecting to the controller successfully. It may resolve the problem and perhaps we don't need to block IP packets in this case, then it wouldn't increase the blast radius of a bug in the controller. |
IMO, the design depend on selection of security policy strategy for pods, it seems this new strategy applies Default-Deny for pods wrt Network Policies realization and NetworkPolicy feature, however it has its own limitations like in case antrea agent fail to restart during first time wrt realization of policies, what will be the failover strategy here for pods on different nodes ! pod traffic will remain blocked until agent recovers & realise np for first time/ at least once . It might be that on one of node, pods remain in default deny state due to step 2(with agent start failure) and one of node it will remain in default allow state(with step 2 & 3 success), |
I'm not sure if I get the point. The PR's point isn't about default-deny/default-allow strategy but to prevent security breach which happens in the described situation (when an agent doesn't know the policies yet). On a Node it has gotten policies, it enforces the requested policies; on a Node it hasn't gotten policies, it doesn't enabling forwarding for safe, otherwise it could be a security breach. |
@antoninbas I had a PoC of storing policies to local files: tnqn@05b7d69 (it's a draft please ignore the code redundancy). I think it's simpler than the current PR and is much less invasive to datapath. As seen from the patch, it mainly changes
This could guarantee that
|
I think the solution makes a lot of sense to me except for two concerns:
|
@tnqn I think I understand the Agent restart case pretty well know, but I may still be confused about the Pod creation case.
So to clarify: this PR is about preventing regressions in NP enforcement (if a NP was ever enforced and the user hasn't updated it, it will always be enforced, or forwarding will not be permitted), not about providing guarantees for newly created Pods? |
Yes, if the connection is successful, the fresh data should be hornored. We can consider it as that the agent disconnects from the controller (no restart), it will use the last data it got from the controller until a successful reconnection. The PoC code doesn't do so but it's something I will add when creating a PR. Then, when the controller is fine, it will behave like before, while it will use local files as the policy data when the controller is down.
This is a good point and I was going to test it too. I was not too worried about it because the I/O is in the process of event handling, which runs in parallel with the rule workers. It won't affect the efficiency of the workers but may slow down how fast the events are delivered to workers. But I assume writing to file is faster than realizing a rule via OVS (the producer is faster than the consumer), especially the file directory is in tmpfs (/run/). But I could run a benchmark to get some figures to validate it. |
Yes, it's to prevent regressions in NP enforcement. Providing guarantees for newly created Pods is something more complicated, requiring the controller to even indicate a Pod doesn't have any policies applied explicitly and a healthy controller must be an essential dependency of Pod connectivity, otherwise the agent won't know whether the Pod doesn't have policies applied or it has but the agent hasn't received yet. However, I think it matches what we declare: without antrea-controller, antrea-agent will still enforce the policies it has received, but new changes to NetworkPolicy or Pod will not be reflected in datapath. And there should be no real problem for newly created Pod in practice as long as antrea-controller is running fine, because we include a Pod in AppliedToGroup right after it's created, before it gets an Pod IP, the agent will see the Pod in the AppliedToGroup right after it gets scheduled to the Node, may even before its Pod interface gets created. If we want to provide strict guarantee for newly created Pods, I think we should have a separate PR, and perhaps make it controlled by a config option which is not enabled by default, due to the side effect that Pod connectivity will rely on antrea-controller to explicitly indicate there is no policies applied to it. |
The following numbers should confirm the extra cost is not going to be a problem:
Although it adds 18µs delay, it can’t compare with the time spent on realizing a rule, which is ms level. After the event handler adds 1k policies to cache and writes them to filesystem, which may take around 23ms, each worker may just realize 1 or 2 rules.
The extra 283ms should be fine as the workers can't process them in time anyway.
|
@tnqn I think all these objects also support Protobuf? That may make a difference for file size and encoding / decoding performance. I know there is also |
Good point. We can leverage the Scheme and Codec objects to encode and decode, which is handy for versioning as well. The results are as below:
|
@tnqn what are the next steps, will you be updating this PR with the new approach, or are you still working through some quirks? |
I'm polishing code and tests for the new approach, particularly how to deal with versioning in the future (e.g. when the API version upgrades to v1 while the stored data is in v1beta2). I plan to create another PR for the new approach and keep this one for reference. |
superseded by #5739 |
In the previous implementation, traffic from/to a Pod may bypass NetworkPolicies applied to the Pod in a short time window when the agent restarts because realizing NetworkPolicies and enabling forwarding are asynchronous.
However, we can't wait for all NetworkPolicies to be realized before enabling forwarding of OVS because there are some cases the former depends on the latter, for example, when proxyAll is enabled, or when it's a Windows Node, in which cases control-plane communication relies on the forwarding of OVS.
This patch takes a more fine-grained approach: block a Pod's IP packets in NetworkPolicy's entry tables until its NetworkPolicies are realized.This granularity leaves the Node and the hostNetwork Pods' traffic untouched and makes the realization issue of a Pod's NetworkPolicies affect the Pod's IP packets only.
The following changes are made to implement the approach:
Depends on #5697