Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mde: set additional remote port #2917

Closed
wants to merge 1 commit into from
Closed

Conversation

lambdai
Copy link
Contributor

@lambdai lambdai commented Jul 16, 2020

Signed-off-by: Yuchen Dai silentdai@gmail.com

Add a functionality to propagate downstream socket port to upstream.

This will be used by BTS for http traffic. BTS ingress listener is always listening on port 15008.
The egress listener need the original workload port(e.g 80) to select the route.

This is mainly to address the use case that
Client send request to svc:80 with :authority svc:80 but VS rewrite :authority to svc:8080 but not L4 address is remained at svc:80.

Before BTS, the ingress listener could recover the port 80. Now this information but be encoded through egress sidecar and ingresssidecar.

This PR add the prototype to deliver the dest port through metadata_exchange plugin, however, it is not done yet
since it's 15008 port is delivered instead of the 80

The follow up PR will be focusing on egress sidecar to figure out the port 80 despite the endpoint port is 15008 (though endpoint metadata)

part of istio/istio#24540

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai lambdai requested a review from a team as a code owner July 16, 2020 00:21
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Jul 16, 2020
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 16, 2020
@lambdai
Copy link
Contributor Author

lambdai commented Jul 16, 2020

@PiotrSikora
The Idea to use per connection metadata instead of per http request is because I plan to set source of original port in endpoint from EDS.
Think about how we can retrieve the original endpoint port from k8s MyService:80 to MyPod:9376. With BTS now it's MyPod:15008.
It's only pilot knows the function original address (MyService:80, MyPod:15008) is MyPod:9376`. And the transformation will be encoded in

cluster {
  name = `MyService:80`
  endpoints = [
    { address = "MyPod1:15008`, original address is `Mypod1:9376`},
    { address = "MyPod2:15008`, original address is `Mypod2:9376`},
  ]
}

Unless we can generate a lookup table(I would do that later),
MyService:8080 cannot share with Myservice:80 because the original address is different.

For the cluster, it is already generated by pilot anyway, now I only need to add the original address column.

  name: my-service
spec:
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376

@lambdai lambdai requested a review from PiotrSikora July 16, 2020 00:46
Copy link
Contributor

@kyessenov kyessenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this doable in wasm ABI? we're going to adapt it but was not prioritized yet.

@lambdai
Copy link
Contributor Author

lambdai commented Jul 16, 2020

@kyessenov I am not sure. I don't have the very clear view of wasm ABI.

It seems to me this PR doesn't use corner hook.

@kyessenov
Copy link
Contributor

kyessenov commented Jul 16, 2020

I don't think it's right to rely on metadata exchange for networking purposes. It introduces an unnecessary coupling between telemetry and basic routing. Telemetry continues to function without metadata exchange in a degraded state but networking will fail hard. Sounds like a layering violation.

@lambdai
Copy link
Contributor Author

lambdai commented Jul 17, 2020

Thank you for the comment @kyessenov ! I think about it and let me explain.

The idea is that networking need to deliver the information between envoy cluster in egress sidecar and envoy listener in ingress sidecar.
Indeed the plugin would be necessary. The role resembles the a RBAC filter. Such a filter may not be needed in each filter chain, but when such a filter is needed, it must be there in the filter chain.

It's pilot's responsibility to set the MDE filter with ALPN config as istio.

@lambdai
Copy link
Contributor Author

lambdai commented Jul 17, 2020

@kyessenov
I do have a further question because you mentioned the degradation, which means you consider this filter optional.

When I add a MDE filter in certain cluster and listener while telemetry add the filter through EnvoyFilter api again, would it cause issue?
I assume when we add the filter more than once, the config are compatible. By compatible I mean the node information could be

  1. No node for MDE filter 1, No node for MDE filter 2
  2. Node meta A for MDE filter 1, Node meta A for MDE filter 2.
    And below won't appear
    Node meta A for MDE filter 1, Node meta B for MDE filter 2.

@kyessenov
Copy link
Contributor

I imagine there would be an issue with the two instances of this filter as-is. It writes into filter state for a well known key, so they would be overriding each other.

@lambdai
Copy link
Contributor Author

lambdai commented Jul 17, 2020

Double write on the same value is not optimal, is it a hard failure?

The background is that the double MDE should be rare, or ideally none if we carefully provide the envoy filter.

Meanwhile I am look into the alternative method to resolve the networking route.

If this update to MDE is not hard failure we can land it first

@kyessenov
Copy link
Contributor

Sorry, I need more context to approve. I'm confused by this effort since "BTS" was supposed to remove TCP tunneling of metadata, yet this looks like it'll depend on TCP tunneling of metadata. Why are we doing BTS then if we still use TCP level data instead of pure HTTP?

@lambdai
Copy link
Contributor Author

lambdai commented Jul 20, 2020

I have two milestones for BTS:
BTS for http traffic and BTS for TCP traffic.

  1. This PR is the first half of BTS: BTS for http traffic only #24540
    So tcp metadata exchange cannot be replaced.
  2. BTS for http. It's a question if all L4 authorization policy before BTS is fully rollout.
    I decide to allow user allow to use the existing L4 authorization policy in this istio version. It authorization policy need to figure out the L4 attribute (most importantly the endpoint port in this PR).

In all TCP metadata exchange cannot be deprecated until next release cycle.

@kyessenov Do I answered your questions?

@kyessenov
Copy link
Contributor

I'm not sure I follow. Regarding point 1, again, why does HTTP tunneling depend on telemetry TCP protocol? TCP telemetry should not be mandatory for networking as it has its drawbacks. It's entirely possible we don't have any telemetry extensions enabled whatsoever if they cause perf issues.

@lambdai
Copy link
Contributor Author

lambdai commented Jul 21, 2020

@kyessenov

why does HTTP tunneling depend on telemetry TCP protocol?

No, http tunneling doesn't depend on telemetry TCP protocol. The tunneling only depends on the original endpoint port. The endpoint port is lost when BTS redirect traffic to dedicated port 15008.
BTS http traffic use the metadata to deliver the info that "the client connects to BTS port 15008, but the original dest port is 80". The 80 is delivered by metadata exchange plugin.

It should have very little perf issue because the connection in BTS is long live connection. The connection is reused in BTS http tunnel and future tcp tunnel

@kyessenov
Copy link
Contributor

@lambdai Has this been discussed with P&T group? I think this still means that TCP metadata exchange becomes mandatory for BTS, since HTTP authz policies would simply not work without it.

@lambdai
Copy link
Contributor Author

lambdai commented Jul 21, 2020

No... BTS http need the metadata anyway, achieve could be an another filter (proxy protocol filter), or a filter build from scratch, or duplicates half of the code of MDE(since I need 1 way information deliver).

I am also experimenting deliver the port through http headers. The fallback is that dest port L4 authz policy will be ignored in experimental BTS phase.

To me MDE is a no brain winner before your concern on perf

@lambdai
Copy link
Contributor Author

lambdai commented Jul 21, 2020

@kyessenov Again, why do you think there is performance issue when MDE is enforced on the connection on http traffic?

One of the goal of BTS is to maintain long live connection between sidecars so that either handshake and L4 MDE cost is
amortized in shared connection.

@kyessenov
Copy link
Contributor

@lambdai there's a cost to telemetry in terms of CPU and memory consumption. We're recommending users to turn off metadata exchange if the cost is not worth the value that telemetry brings (there're always basic envoy stats available). I think we need to discuss whether telemetry metadata exchange should become mandatory even without the telemetry enabled.

@lambdai
Copy link
Contributor Author

lambdai commented Jul 21, 2020

Chatted with @PiotrSikora, we can also introduce proxy protocol transport socket to achieve the goal.
See envoyproxy/envoy#11584

@kyessenov
Copy link
Contributor

Does PROXY require some ALPN indicator as well?

@lambdai
Copy link
Contributor Author

lambdai commented Jul 22, 2020

Does PROXY require some ALPN indicator as well?

No. Why are you asking? Let me guess, are you worry if the peer doesn't recognize proxy protocol?

@kyessenov
Copy link
Contributor

@lambdai No just want to avoid another explosion of ALPN protocols.

@lambdai
Copy link
Contributor Author

lambdai commented Jul 22, 2020

Good point!

@lambdai
Copy link
Contributor Author

lambdai commented Aug 4, 2020

Close for now. We can add the original port magic header before "host is selected", at the cost of "assuming all the endpoint in the service has the same target port". This is true in k8s service but not in other service registries. Will update BTS design to reflect it.

@lambdai lambdai closed this Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants