Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for an Application-specified cookie for load balancing policy. #2856

Open
prateekjainaa opened this issue Aug 31, 2020 · 16 comments
Labels
area/httpproxy Issues or PRs related to the HTTPProxy API. blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API

Comments

@prateekjainaa
Copy link

prateekjainaa commented Aug 31, 2020

Hi All,

Is there an equivalent feature to persistent stickiness in contour as it is present in HAproxy? Or of there is someway of achieving it in contour?

Q. What do we mean by persistent stickiness?
Ans. Clients/Browser requests routed to same server/pod, even after browser restart. For more on this, you can refer to section:
The difference between persistence and affinity here

Regards,
Prateek

@stevesloka
Copy link
Member

I think what you're looking for @prateekjainaa is what envoy calls session-affinity.

Have a look at the following docs: https://projectcontour.io/docs/v1.8.0/httpproxy/#session-affinity

@stevesloka stevesloka added the kind/question Categorizes an issue as a user question. label Aug 31, 2020
@prateekjainaa
Copy link
Author

@stevesloka , apologies from my side. I should have stressed more on the "persistence" nature of session affinity. As far as I understand, session affinity works till browser has session with site. If browser is restarted then, session affinity is gone (request can land on new server). HAproxy supports this feature where session affinity survives browser restarts (persistent stickiness).

Let me know, if I missing something here.

@jpeach
Copy link
Contributor

jpeach commented Sep 1, 2020

The contour Cookie load balancer strategy uses a session cookie. IIUC, the HAProxy "persistent" session affinity does cookie-balancing but uses an application-specified cookie to do so (which means that the persistence of the balancing is proportional to the persistence of the application cookie value).

That seems like a reasonable feature request.

@jpeach jpeach added area/httpproxy Issues or PRs related to the HTTPProxy API. blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Sep 1, 2020
@jpeach jpeach changed the title persistent stickiness equivalent in contour? Add support for an Application-specified cookie for load balancing policy. Sep 1, 2020
@jpeach jpeach removed the kind/question Categorizes an issue as a user question. label Sep 1, 2020
@prateekjainaa
Copy link
Author

@jpeach IIUC, stickiness survives contour or envoy restarts; because it is based upon cookies. Let me know if I am missing something here.

@jpeach
Copy link
Contributor

jpeach commented Sep 2, 2020

@jpeach IIUC, stickiness survives contour or envoy restarts; because it is based upon cookies. Let me know if I am missing something here.

Contour is not in the data path so it has no effects. Cookie balancing survives envoy restart but not browser restart (it's a session cookie).

@prateekjainaa
Copy link
Author

prateekjainaa commented Sep 2, 2020 via email

@tsaarni
Copy link
Member

tsaarni commented Sep 3, 2020

I chatted with @prateekjainaa and heard about this issue.

I checked design/session-affinity.md and it seems that cookies set by application are not working well with Envoy (though RING_HASH LB policy does not seem to be used anymore)

Using the session id or key as the ring hash suffered from the bootstrapping problem of the first request being routed to backend A, which sets a session cookie, however that session cookie will not hash to backend A and cause subsequent requests to arrive at the wrong server, possibly repeating this behavior.

It would be possible to set time-to-live for the cookie that Envoy sets. If value is non-zero, the cookie would be persisted by browser accordingly.

Setting TTL was also considered in the design doc (chapter "Cookie design") but with a conclusion that any time-to-live is equally wrong due to fragility of session affinity in general. Therefore this option is not given to user. Could this option be reconsidered?

@youngnick
Copy link
Member

Currently, Contour's session persistence works like this:
If you enable it, Contour will tell Envoy to generate a browser session cookie and hash backends based on that cookie. We achieve this by explicitly setting a TTL of 0 on the cookie.

This was chosen because there is no way to choose a default value for the TTL that makes sense - it needs to be relatively close to the lifetime of a backend pod, or you will either have unnecessary churn, or sessions will accumulate at the longest-living pod.

This was done to minimise required configuration, and to not push complexity back to our users. The Kubernetes environment is different in the amount of expected change to standalone hosts, and so may have different behavior when you use the same mechanisms you used to.

That said, if we want to add this functionality, we have to allow specification of some fields, and explain how they interact to produce various outcomes.

(The following details were taken from the envoy docs.)

The two key fields are the name and the ttl.
The name field is required, and the TTL determines the type of stickiness.

TTL Type of stickiness
Absent Passive. Envoy will only do persistence if there is a value in the specified name field.
0 Browser Session cookie will be generated.
Specified Cookie with given TTL will be generated.

We can use the 'Absent' case for application cookies, Envoy will do no cookie generation, and requests will not be sticky until the application sets the correct cookie.

Both ttl of 0 and some value will work, but will need documentation to explain what they will do.

In order to implement this, we'll need some answers to these questions:

  • What we want to achieve with this feature. What types of stickiness are in scope?
  • How do the available Envoy options meet this scope?
  • How do we document the caveats around choosing a TTL if you are going to choose one?
  • Where will we put this configuration? There needs to be, at a minimum, name and ttl fields available to be tweaked, and, if we allow the path to be set to a value other than / (as it is currently), there will be interactions with inclusion that will need to be carefully considered. I'd guess that LoadBalancerPolicy may be the right container, but I'm not sure what this would look like.

@tsaarni
Copy link
Member

tsaarni commented Sep 18, 2020

I respond with my understanding but @prateekjainaa can fill in.

I need to leave the last bullet for later (i.e. the hardest question: how would the API change look like) but I wanted to add example use case and ask a few questions from you.

We can use the 'Absent' case for application cookies, Envoy will do no cookie generation, and requests will not be sticky until the application sets the correct cookie.

While reading design/session-affinity.md (chapter "Bootstrapping issues") I got an impression that application provided cookies do not work too well: Upstream application would likely assume, that the application instance that creates the "session" cookie (with application-specified TTL), will be the same instance that the session sticks to. Since the upstream service does not know the hashing algorithm that Envoy uses, it cannot possibly create a cookie that would lead to Envoy selecting that particular instance for the next request. Therefore the next request may be likely to be forwarded to another instance, which might then generate yet another cookie causing the problem to be repeated.

I did not find more information from Envoy documentation or elsewhere, but the explanation in design doc sounded right to me, which raises a question: in which scenario one would use Absent/Passive mode in Envoy? Maybe there is something I missed or misunderstood?

What we want to achieve with this feature. What types of stickiness are in scope?

Example use case scenario:

Legacy stateful application is migrated to Kubernetes and then scaled up - each replica of the service is independent stateful instance of the application.

Problem description:

The application has a concept of login which starts a user session of a known period, lets say e.g. 8 hours for a typical session lasting for a workday. The application instance keeps the user session data in memory for that period. User closes the browser when leaving for lunch (or browser closes for some other reason) and the session cookies are lost. When user is back from lunch they expect the session to be still active, but since cookie was lost they get forwarded to another application instance. User needs to start from empty state and the previously active session is left hanging in the previous application instance for the rest of the day.

Wanted behavior:

Configure session stickiness to match with the session length defined by the application.

How do the available Envoy options meet this scope?

Having an option of specifying TTL for Envoy-generated cookie would likely allow the use case but I guess it is yet to be tested and proven.

If it works, do you think this could be added and should the API change proposal then cover also the application generated cookie (Absent/Passive case)?

@youngnick
Copy link
Member

Since the upstream service does not know the hashing algorithm that Envoy uses, it cannot possibly create a cookie that would lead to Envoy selecting that particular instance for the next request. Therefore the next request may be likely to be forwarded to another instance, which might then generate yet another cookie causing the problem to be repeated.

That's my understanding as well. As I think I said above, other proxies mutate the cookie a little bit to indicate which backend the request should be routed to once the application has generated it, but Envoy does not support this. The only two options are "Envoy doesn't modify the cookie at all" or "Envoy generates the cookie itself", from what I can see.

I'll be honest, I don't see how this would achieve the outcome of having requests with no cookie go to any backend, and requests with a cookie be sent back to the same server that issued that cookie.

What Envoy does seem to be able to do is un-cookieed request comes in, backend issues cookie, on next request, Envoy picks a backend to send that traffic to, and will send it to that one. This would require the application to propagate which session belongs to which backend itself, before responding to the request, otherwise requests could race around the backend set without ever settling, as you said @tsaarni. This does not seem like a good idea.

Allowing the setting of a TTL makes sense though. I can see that potentially being added, assuming the API change can be done in a way that makes sense.

@sunjayBhatia
Copy link
Member

sunjayBhatia commented Aug 24, 2021

Now that we support request hash based load balancing, this sounds like one could hash on the Cookie header sent by a client (with the value generated by a single application instance) to implement this? would be some quirks if you have multiple cookies in a single header or multiple Cookie headers entirely

apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: example
spec:
  virtualhost:
    fqdn: example.projectcontour.io
  routes:
  - services:
    - name: example-app
      port: 80
    loadBalancerPolicy:
      strategy: RequestHash
      requestHashPolicies:
      - headerHashOptions:
          headerName: Cookie

we could also move to supporting passive or Envoy generated cookie hashing in the RequestHash load balancer strategy and deprecate the existing Cookie load balancer strategy

the same bootstrapping problem still exists it seems even with this, so not sure how helpful this is

@sunjayBhatia
Copy link
Member

sunjayBhatia commented Aug 24, 2021

could do something hacky here to fix the bootstrap issue, which would look like:

  • change cookie load balancing strategy to passive cookie hashing, Envoy will now not generate a cookie value to hash but look for a cookie of name X-Contour-Session-Affinity
  • add some custom lua logic when cookie based session affinity is desired:
    • in lua envoy_on_request check if X-Contour-Session-Affinity cookie is present
    • if so, do nothing
    • if not, insert cookie with random value, also save in lua filter metadata
    • in envoy_on_response, if lua generated cookie is present in metadata, add Set-Cookie header with the cookie value

this could interoperate with the upcoming work in cookie rewriting as well to add more attributes to the relevant cookie, or we could make these configurable and rewrite in lua ourselves

its not really much different than Envoy generating a cookie tbh it just forces the first request in the passive cookie load balancing flow to be routed to somewhere that can be used again, but unless the app knows about it its basically what we already have

just thought of this w/o trying it out, might be missing something, this is also probably not valid http cookie semantics/usage, since the cookie is coming from thin air, rather than the server telling a client to save a cookie and send it

@sunjayBhatia
Copy link
Member

again doesnt really solve the issue since the app isnt generating the cookie, so again not sure how useful any of this is

@youngnick
Copy link
Member

youngnick commented Aug 26, 2021

I seem to recall that other load balancers allow specifying the backend by prepending it to the cookie text or something? Without something that allows the generating service to tell Envoy which backend to send it to (which will require the backing server to know the generated cluster name), I don't see how this can ever work.

This seems like something we need to ask upstream about, and see if anyone else has solved this with Envoy.

Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 10, 2024
@erikschul
Copy link

erikschul commented Jan 11, 2024

Not stale.
I'm experiencing this issue as well.
My use case is a web app. During rollout, multiple versions may be running in parallel, that are using different assets, e.g. logo1.png or logo2.png. If requests are randomly distributed, some will appear to randomly fail.
Therefore, a sticky session is required. This is supported by Contour.
But during rollout, pods are added/removed, and horizontal scaling could also cause issues.
It's therefore imperative that the session is pinned to the specific server (or application version, but that's more complicated to implement).
I realize that the problem is that Envoy doesn't support this, but that may mean that I have to use a different ingress provider.

AFAICT it also seems that other Envoy-based solutions haven't solved this problem either: Emissary, Gloo.

It seems that ingress-nginx supports it: https://kubernetes.github.io/ingress-nginx/examples/affinity/cookie/

... the response contains a Set-Cookie header with the settings we have defined [...] it contains a randomly generated key corresponding to the upstream used for that request [...]. If a client sends a cookie that doesn't correspond to an upstream, NGINX selects an upstream and creates a corresponding cookie. If the backend pool grows NGINX will keep sending the requests through the same server of the first request, even if it's overloaded. When the backend server is removed, the requests are re-routed to another upstream server. This does not require the cookie to be updated [...]

@skriss skriss added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/httpproxy Issues or PRs related to the HTTPProxy API. blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API
Projects
None yet
Development

No branches or pull requests

8 participants