Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should http and http2 infer common protocol to be tcp? #2445

Closed
lahabana opened this issue Jul 28, 2021 · 17 comments
Closed

Should http and http2 infer common protocol to be tcp? #2445

lahabana opened this issue Jul 28, 2021 · 17 comments
Assignees
Labels
kind/improvement Improvement on an existing feature triage/rotten closed due to lack of information for too long, rejected feature...

Comments

@lahabana
Copy link
Contributor

In this

// getCommonProtocol returns a common protocol between given two.
//
// E.g.,
// a common protocol between HTTP and HTTP2 is HTTP2,
// a common protocol between HTTP and HTTP is HTTP,
// a common protocol between HTTP and TCP is TCP,
// a common protocol between GRPC and HTTP2 is HTTP2,
// a common protocol between HTTP and HTTP2 is HTTP.
func getCommonProtocol(one, another mesh_core.Protocol) mesh_core.Protocol {
if one == another {
return one
}
if one == mesh_core.ProtocolUnknown || another == mesh_core.ProtocolUnknown {
return mesh_core.ProtocolUnknown
}
oneProtocolStack, exist := protocolStacks[one]
if !exist {
return mesh_core.ProtocolUnknown
}
anotherProtocolStack, exist := protocolStacks[another]
if !exist {
return mesh_core.ProtocolUnknown
}
for _, firstProtocol := range oneProtocolStack {
for _, secondProtocol := range anotherProtocolStack {
if firstProtocol == secondProtocol {
return firstProtocol
}
}
}
return mesh_core.ProtocolUnknown
}
we can see that http2 and http will return tcp which makes sense.

However, this is used in 2 places:

  1. generateCDS()

    func (o OutboundProxyGenerator) generateCDS(ctx xds_context.Context, proxy *model.Proxy, services envoy_common.Services) (*model.ResourceSet, error) {
    resources := model.NewResourceSet()
    for _, serviceName := range services.Names() {
    service := services[serviceName]
    healthCheck := proxy.Policies.HealthChecks[serviceName]
    circuitBreaker := proxy.Policies.CircuitBreakers[serviceName]
    protocol := o.inferProtocol(proxy, service.Clusters())
    for _, cluster := range service.Clusters() {
    edsClusterBuilder := envoy_clusters.NewClusterBuilder(proxy.APIVersion).
    Configure(envoy_clusters.Timeout(protocol, cluster.Timeout())).
    Configure(envoy_clusters.CircuitBreaker(circuitBreaker)).
    Configure(envoy_clusters.OutlierDetection(circuitBreaker)).
    Configure(envoy_clusters.HealthCheck(protocol, healthCheck))
    if service.HasExternalService() {
    edsClusterBuilder.
    Configure(envoy_clusters.StrictDNSCluster(cluster.Name(), proxy.Routing.OutboundTargets[serviceName],
    proxy.Dataplane.IsIPv6())).
    Configure(envoy_clusters.ClientSideTLS(proxy.Routing.OutboundTargets[serviceName]))
    switch protocol {
    case mesh_core.ProtocolHTTP:
    edsClusterBuilder.Configure(envoy_clusters.Http())
    case mesh_core.ProtocolHTTP2, mesh_core.ProtocolGRPC:
    edsClusterBuilder.Configure(envoy_clusters.Http2())
    default:
    }
    } else {
    edsClusterBuilder.
    Configure(envoy_clusters.EdsCluster(cluster.Name())).
    Configure(envoy_clusters.LB(cluster.LB())).
    Configure(envoy_clusters.ClientSideMTLS(ctx, proxy.Metadata, serviceName, []envoy_common.Tags{cluster.Tags()})).
    Configure(envoy_clusters.Http2())
    }
    edsCluster, err := edsClusterBuilder.Build()
    if err != nil {
    return nil, errors.Wrapf(err, "build CDS for cluster %s failed", cluster.Name())
    }
    resources.Add(&model.Resource{
    Name: cluster.Name(),
    Origin: OriginOutbound,
    Resource: edsCluster,
    })
    }
    }
    return resources, nil
    }

    In this case it looks it should be using the protocol of the specific cluster and not infer the protocol from all the clusters.

  2. generateLDS()

    func (_ OutboundProxyGenerator) generateLDS(proxy *model.Proxy, routes envoy_common.Routes, outbound *kuma_mesh.Dataplane_Networking_Outbound, protocol mesh_core.Protocol) (envoy_common.NamedResource, error) {
    oface := proxy.Dataplane.Spec.Networking.ToOutboundInterface(outbound)
    meshName := proxy.Dataplane.Meta.GetMesh()
    sourceService := proxy.Dataplane.Spec.GetIdentifyingService()
    serviceName := outbound.GetTagsIncludingLegacy()[kuma_mesh.ServiceTag]
    outboundListenerName := envoy_names.GetOutboundListenerName(oface.DataplaneIP, oface.DataplanePort)
    retryPolicy := proxy.Policies.Retries[serviceName]
    var timeoutPolicyConf *kuma_mesh.Timeout_Conf
    if timeoutPolicy := proxy.Policies.Timeouts[oface]; timeoutPolicy != nil {
    timeoutPolicyConf = timeoutPolicy.Spec.GetConf()
    }
    filterChainBuilder := func() *envoy_listeners.FilterChainBuilder {
    filterChainBuilder := envoy_listeners.NewFilterChainBuilder(proxy.APIVersion)
    switch protocol {
    case mesh_core.ProtocolGRPC:
    filterChainBuilder.
    Configure(envoy_listeners.HttpConnectionManager(serviceName, false)).
    Configure(envoy_listeners.Tracing(proxy.Policies.TracingBackend, sourceService)).
    Configure(envoy_listeners.HttpAccessLog(meshName, envoy_common.TrafficDirectionOutbound, sourceService, serviceName, proxy.Policies.Logs[serviceName], proxy)).
    Configure(envoy_listeners.HttpOutboundRoute(serviceName, routes, proxy.Dataplane.Spec.TagSet())).
    Configure(envoy_listeners.Retry(retryPolicy, protocol)).
    Configure(envoy_listeners.GrpcStats())
    case mesh_core.ProtocolHTTP, mesh_core.ProtocolHTTP2:
    filterChainBuilder.
    Configure(envoy_listeners.HttpConnectionManager(serviceName, false)).
    Configure(envoy_listeners.Tracing(proxy.Policies.TracingBackend, sourceService)).
    Configure(envoy_listeners.HttpAccessLog(
    meshName,
    envoy_common.TrafficDirectionOutbound,
    sourceService,
    serviceName,
    proxy.Policies.Logs[serviceName],
    proxy,
    )).
    Configure(envoy_listeners.HttpOutboundRoute(serviceName, routes, proxy.Dataplane.Spec.TagSet())).
    Configure(envoy_listeners.Retry(retryPolicy, protocol))
    case mesh_core.ProtocolKafka:
    filterChainBuilder.
    Configure(envoy_listeners.Kafka(serviceName)).
    Configure(envoy_listeners.TcpProxy(serviceName, routes.Clusters()...)).
    Configure(envoy_listeners.NetworkAccessLog(
    meshName,
    envoy_common.TrafficDirectionOutbound,
    sourceService,
    serviceName,
    proxy.Policies.Logs[serviceName],
    proxy,
    )).
    Configure(envoy_listeners.MaxConnectAttempts(retryPolicy))
    case mesh_core.ProtocolTCP:
    fallthrough
    default:
    // configuration for non-HTTP cases
    filterChainBuilder.
    Configure(envoy_listeners.TcpProxy(serviceName, routes.Clusters()...)).
    Configure(envoy_listeners.NetworkAccessLog(
    meshName,
    envoy_common.TrafficDirectionOutbound,
    sourceService,
    serviceName,
    proxy.Policies.Logs[serviceName],
    proxy,
    )).
    Configure(envoy_listeners.MaxConnectAttempts(retryPolicy))
    }
    filterChainBuilder.
    Configure(envoy_listeners.Timeout(timeoutPolicyConf, protocol))
    return filterChainBuilder
    }()
    listener, err := envoy_listeners.NewListenerBuilder(proxy.APIVersion).
    Configure(envoy_listeners.OutboundListener(outboundListenerName, oface.DataplaneIP, oface.DataplanePort, model.SocketAddressProtocolTCP)).
    Configure(envoy_listeners.FilterChain(filterChainBuilder)).
    Configure(envoy_listeners.TransparentProxying(proxy.Dataplane.Spec.Networking.GetTransparentProxying())).
    Build()
    if err != nil {
    return nil, errors.Wrapf(err, "could not generate listener %s for service %s", outboundListenerName, serviceName)
    }
    return listener, nil
    }

    Where it treats http and http2 the same way.

For (1) it seems that we could configure the cluster with AutoHttpConfig (ALPN) in the case where a cluster has a mix of http and http2 endpoints (which seems to be a rare usecase at the cluster level).

It might good enough to introduce a ProtocolHttpX and then handle on a case by case where it makes sense. I think this would be a better behaviour than risking downgrading to TCP when things would work the same for HTTP and HTTP2.

@jpeach
Copy link
Contributor

jpeach commented Jul 28, 2021

In this

// getCommonProtocol returns a common protocol between given two.
//
// E.g.,
// a common protocol between HTTP and HTTP2 is HTTP2,
// a common protocol between HTTP and HTTP is HTTP,
// a common protocol between HTTP and TCP is TCP,
// a common protocol between GRPC and HTTP2 is HTTP2,
// a common protocol between HTTP and HTTP2 is HTTP.
func getCommonProtocol(one, another mesh_core.Protocol) mesh_core.Protocol {
if one == another {
return one
}
if one == mesh_core.ProtocolUnknown || another == mesh_core.ProtocolUnknown {
return mesh_core.ProtocolUnknown
}
oneProtocolStack, exist := protocolStacks[one]
if !exist {
return mesh_core.ProtocolUnknown
}
anotherProtocolStack, exist := protocolStacks[another]
if !exist {
return mesh_core.ProtocolUnknown
}
for _, firstProtocol := range oneProtocolStack {
for _, secondProtocol := range anotherProtocolStack {
if firstProtocol == secondProtocol {
return firstProtocol
}
}
}
return mesh_core.ProtocolUnknown
}

 // a common protocol between HTTP and HTTP2 is HTTP2, 
...
 // a common protocol between HTTP and HTTP2 is HTTP. 

Only one of these can be true.

we can see that http2 and http will return tcp which makes sense.

Maybe it makes sense in terms of protocol layering, but it makes no sense semantically. Any server that is offering a http/2 service can offer the same service on http, as only the wire protocol is different.

It might good enough to introduce a ProtocolHttpX and then handle on a case by case where it makes sense. I think this would be a better behaviour than risking downgrading to TCP when things would work the same for HTTP and HTTP2.

I agree that we don't want to downgrade to TCP. I'm not sure what the protocol tag is supposed to mean in these cases. If the application can't handle HTTP/2 that doesn't mean that the mesh can't still speak HTTP/2 over the wire.

@lahabana
Copy link
Contributor Author

I agree that we don't want to downgrade to TCP. I'm not sure what the protocol tag is supposed to mean in these cases. If the application can't handle HTTP/2 that doesn't mean that the mesh can't still speak HTTP/2 over the wire.

It makes sense indeed that the mesh could talk http/2 and the only needed downgrade is between the instance and the proxy.

It seems like the only place where we need to make a difference between http and http/2 is in the inbound_proxy_generator when creating the CDS resource (which already seem to be happening correctly).

For outbound things already happen correctly by simply forcing http2 (at least for http).

@github-actions
Copy link
Contributor

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Nov 23, 2021
@lahabana lahabana added triage/pending This issue will be looked at on the next triage meeting and removed triage/stale Inactive for some time. It will be triaged again labels May 23, 2022
@github-actions
Copy link
Contributor

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Jun 23, 2022
@lahabana lahabana removed the triage/stale Inactive for some time. It will be triaged again label Jun 27, 2022
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Jul 29, 2022
@github-actions
Copy link
Contributor

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

@lahabana lahabana removed the triage/stale Inactive for some time. It will be triaged again label Jul 29, 2022
@lahabana
Copy link
Contributor Author

Here's the use-case where this causes problem:

  1. set 2 dps with kuma.io/protocol: http and tracing. This works well!
  2. change one of these dps to use kuma.io/protocol: http2. This fails because tracing is only set for http and http2 and we infer the common protocol as tcp.

This code is used both for configuring the Listener and the Cluster.
In the case of Listener I believe it's fine to infer http and http2 to http2 because envoy http2 listeners can downgrade to http1.1 when needed
In the case of Cluster it's more problematic as we have no guarantee that the downstream app we're proxying for will be able to support http2.

So it seems we need 2 things:

  • in Cluster context: http and http2 get inferred to http so if the downstream app doesn't support http2 there's no problem
  • in Listener context: http and http2 get inferred to http2 because Envoy h2 listeners will work with h1.1 clients.

@jakubdyszkiewicz WDYT?

@jakubdyszkiewicz
Copy link
Contributor

Triage: this protocol inference should not be used on the inbound side at all. With this in mind let's try to specify what to infer for inbound/outbound listener and inbound/outbound cluster.

@jakubdyszkiewicz jakubdyszkiewicz added triage/needs-information Reviewed and some extra information was asked to the reporter and removed triage/pending This issue will be looked at on the next triage meeting labels Aug 29, 2022
@michaelbeaumont
Copy link
Contributor

@lahabana any more info on this? Should we plan to look at this/hand as an on duty task?

@lahabana
Copy link
Contributor Author

lahabana commented Oct 5, 2022

Yes I think so. I'm also unclear why this protocol inference should not be used on the inbound side at all. @jakubdyszkiewicz could you explain?

@jakubdyszkiewicz
Copy link
Contributor

For specific DP proxy, you don't need to do any inference on inbound side, you have an explicit protocol in kuma.io/protocol.

@lahabana lahabana added triage/pending This issue will be looked at on the next triage meeting and removed triage/needs-information Reviewed and some extra information was asked to the reporter labels Oct 17, 2022
@lahabana
Copy link
Contributor Author

Question for triage: Feels like we have a path forward now right?

@kleinfreund
Copy link
Contributor

Triage: We don’t need any inference on the inbound side (it’s already annotated on the DPP). See #2445 (comment) for the rest.

@kleinfreund kleinfreund added triage/accepted The issue was reviewed and is complete enough to start working on it kind/improvement Improvement on an existing feature and removed triage/pending This issue will be looked at on the next triage meeting labels Nov 2, 2022
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Feb 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2023

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@jakubdyszkiewicz jakubdyszkiewicz removed the triage/stale Inactive for some time. It will be triaged again label Feb 1, 2023
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label May 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 3, 2023

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@lukidzi lukidzi removed the triage/stale Inactive for some time. It will be triaged again label May 8, 2023
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Aug 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2023

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@bartsmykla bartsmykla removed the triage/stale Inactive for some time. It will be triaged again label Aug 8, 2023
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Nov 9, 2023
Copy link
Contributor

github-actions bot commented Nov 9, 2023

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@jakubdyszkiewicz jakubdyszkiewicz added triage/pending This issue will be looked at on the next triage meeting and removed triage/accepted The issue was reviewed and is complete enough to start working on it triage/stale Inactive for some time. It will be triaged again labels Nov 9, 2023
@jakubdyszkiewicz
Copy link
Contributor

Closing. @lukidzi is working on this as a part of #6589

@jakubdyszkiewicz jakubdyszkiewicz closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2023
@jakubdyszkiewicz jakubdyszkiewicz added triage/rotten closed due to lack of information for too long, rejected feature... and removed triage/pending This issue will be looked at on the next triage meeting labels Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/improvement Improvement on an existing feature triage/rotten closed due to lack of information for too long, rejected feature...
Projects
None yet
Development

No branches or pull requests

7 participants