Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow unknown protocols #49

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 20 additions & 3 deletions pkg/networkpolicy/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -391,8 +391,8 @@ func (c *Controller) Run(ctx context.Context) error {

packet, err := parsePacket(*a.Payload)
if err != nil {
klog.Infof("Can not process packet %d applying default policy (failOpen: %v): %v", *a.PacketID, c.config.FailOpen, err)
c.nfq.SetVerdict(*a.PacketID, verdict) //nolint:errcheck
klog.Infof("Can not process packet %d, allowing ... : %v", *a.PacketID, err)
c.nfq.SetVerdict(*a.PacketID, nfqueue.NfAccept) //nolint:errcheck

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is really correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means this packet is not TCP or UDP or SCTP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it means "a parsing error occurred". I'm not sure we want to say all parse errors should be "accepts".

(I'm not sure we don't want that either...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it can only happens if is not ip or ipv6 or one of those protocols ... but indeed is confusing the way it is now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All packets with unknown protocol (not TCP or UDP or SCTP) must be accepted. Unfortunately there is no v1.ProtocolUnknown constant, but we can type-cast:

	default:
		//return t, fmt.Errorf("unknown protocol %d", protocol)
		t.proto = v1.Protocol("unknown")

but that feels a bit hackish...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok to just store the proto as an int in struct packet, and let the caller match on "syscall.IPPROTO_*". IMO that's the most flexible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that it doesn't seem right to say that every possible type of error from parsePacket should result in the packet being accepted. IMO it would be better to have parsePacket return both an error and a verdict, even if that verdict happens to be "accept" for all of the errors it currently catches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some investigation, I agree: all unrecognized packets shall not be accepted (#51 (comment)). However, I think parsePacket() should not set a verdict. It should parse the packet and let the caller decide, but then is can't return TCP,UDP,SCTP or an error for everything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok to just store the proto as an int in struct packet, and let the caller match on "syscall.IPPROTO_*". IMO that's the most flexible.

that sounds like the best idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have experimented some with this, and I think the best is to keep packet.proto as v1.Protocol, and add a new "ipproto int" in the packet structure.

An IPPROTO_ROUTING extension header, or extension header out-of-order will return a

var ErrorExtensionHeader = fmt.Errorf("unsupported extension header")

return 0
}
packet.id = *a.PacketID
Expand Down Expand Up @@ -645,7 +645,7 @@ func (c *Controller) syncNFTablesRules(ctx context.Context) error {
}
}

for _, hook := range []knftables.BaseChainHook{knftables.ForwardHook} {
for _, hook := range []knftables.BaseChainHook{knftables.ForwardHook, knftables.OutputHook} {
chainName := string(hook)
tx.Add(&knftables.Chain{
Name: chainName,
Expand All @@ -656,6 +656,16 @@ func (c *Controller) syncNFTablesRules(ctx context.Context) error {
tx.Flush(&knftables.Chain{
Name: chainName,
})

if hook == knftables.OutputHook {
// accept all outbound traffic to the loopback interface
tx.Add(&knftables.Rule{
Chain: chainName,
Rule: knftables.Concat(
"oif", "lo", "accept"),
})
Comment on lines +662 to +666
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uablrek can you check in your PR this solves the problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this update, but the address matches should never hit anything with "lo" since such packet would be a "martian"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked! My world is crumbling! My reasoning is that since the set's only contains pod-addresses, packets matching whose sets can't be routed to/from "lo", since that would make them martians.

Unless... a SatefulSet pod actually have a loopback address as pod-address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astoycos @tssurya do you have an explanation for this failurs on the conformance suite only with ipv6?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modified chain for reference:

        chain output {
                type filter hook output priority filter - 5; policy accept;
                ct state established,related accept
                oif "lo" accept
                ip saddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
                ip daddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
                ip6 saddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
                ip6 daddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
        }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule definitely seems wrong.

I think sometimes packets are considered by the kernel to be traversing lo if they have the same local source and destination IP, even if that IP isn't 127.0.0.1/::1?

Anyway, the failure here is probably caused by the fact that in ipvs, packets traverse the network stack twice (from source to ipvs0 and then from ipvs0 to destination), and the DNAT and SNAT don't happen in exactly the same place as with iptables/nftables kube-proxy. (The DNAT happens inside ipvs0... I forget if the SNAT happens in the first or second traversal.)

Anyway, point is, the iptables/nftables logic is going to end up doing the wrong thing for some packets in ipvs mode. The fix might be to ignore packets with oif = "ipvs0" so you only look at them on the second traversal, but I'm really not sure about that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know Calico needs to explicitly know whether you're using iptables or ipvs, so it's possible we actually need different logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this rule fixes the IPv6 jobs, ipvs is unrelated here

Copy link
Contributor Author

@aojea aojea Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sometimes packets are considered by the kernel to be traversing lo if they have the same local source and destination IP, even if that IP isn't 127.0.0.1/::1?

So, what I want to express is "we only need to process in OUTPUT local generated packets that are sent outside (not localhost dest)"

image

I had this issue in the past with kubelet probes and we solved it using the lo rule, I think it is correct but I can not explain the science behind it, just the evidence it works as expected, any packet destined to localhost does not have to be processed by network policies anyway, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this rule fixes the IPv6 jobs, ipvs is unrelated here

oh... so... the issue is that if we add processing on the output hook, then IPv6 (but not IPv4) breaks unless we also add the ! lo rule?

So, what I want to express is "we only need to process in OUTPUT local generated packets that are sent outside (not localhost dest)"

Why should it matter though? We shouldn't be blocking any packets that are destined for localhost. With the podips changes, we shouldn't even intercept them...

I can not explain the science behind it

boo!

any packet destined to localhost does not have to be processed by network policies anyway, right?

A packet from a pod-network pod addressed to 127.0.0.0/8 or ::1 would not leave the pod netns, and it's not possible (by validation) to have an EndpointSlice that points to 127.0.0.0/8 or ::1, so you can't end up with that via Service DNAT either. But you can add non-localhost IPs to lo I think?

}

// instead of aggregating all the expresion in one rule, use two different
// rules to understand if is causing issues with UDP packets with the same
// tuple (https://github.com/kubernetes-sigs/kube-network-policies/issues/12)
Expand All @@ -665,6 +675,13 @@ func (c *Controller) syncNFTablesRules(ctx context.Context) error {
"ct", "state", "established,related", "accept"),
})

// only process protocols supported by Kubernetes network policies
tx.Add(&knftables.Rule{
Chain: chainName,
Rule: knftables.Concat(
"meta", "l4proto", "!=", "{ tcp, udp, sctp }", "accept"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There's no real reason to use knftables.Concat here. You can just write it as a single string.)

})

action := fmt.Sprintf("queue num %d", c.config.QueueID)
if c.config.FailOpen {
action += " bypass"
Expand Down
4 changes: 2 additions & 2 deletions pkg/networkpolicy/packet.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ func parsePacket(b []byte) (packet, error) {
// NextHeader (not extension headers supported)
protocol = int(b[6])
default:
return t, fmt.Errorf("unknown versions %d", version)
return t, fmt.Errorf("unknown IP version %d", version)
}

var dataOffset int
Expand All @@ -73,7 +73,7 @@ func parsePacket(b []byte) (packet, error) {
t.proto = v1.ProtocolSCTP
dataOffset = hdrlen + 8
default:
return t, fmt.Errorf("unknown protocol %d", protocol)
return t, fmt.Errorf("IP family %s unknown transport protocol %d", t.family, protocol)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want to log an error for every ICMPv6 message. You should add a case 58: for that.
Probably parsePacket needs another return value for "not an error, but just accept the packet".

(It seems that your callback only gets called for IP and IPv6 protocol packets, right? And, in particular, not ARP and ICMP packets, which are separate L3 protocols, unlike ICMPv6/NDP, which are subprotocols of IPv6.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the next commit, only TCP , UDP and SCTP packets are sent to the userspace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I think all fragments will also have an unknown protocol. But I am about to fix that (#15)

Copy link
Contributor

@uablrek uablrek Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the next commit, only TCP , UDP and SCTP packets are sent to the userspace

That may fix the fragment problem. The kernel will reassemble packets before sending them to user-space (perhaps it already does, in which case #15 isn't a kind/bug after all).

}
// TCP, UDP and SCTP srcPort and dstPort are the first 4 bytes after the IP header
t.srcPort = int(binary.BigEndian.Uint16(b[hdrlen : hdrlen+2]))
Expand Down
Loading