Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCSP response cache is not updated in a timely manner #10632

Open
VWDude opened this issue Nov 8, 2023 · 12 comments
Open

OCSP response cache is not updated in a timely manner #10632

VWDude opened this issue Nov 8, 2023 · 12 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@VWDude
Copy link

VWDude commented Nov 8, 2023

What happened:

We are using ingress-nginx with the config value "enable-ocsp": true.
In the beginning this works as expected, but the OCSP cache is not updated, when the response expires after 2 days:
Taken from openssl response on 08.Nov.2023 13:53 GMT:
image

What you expected to happen:
OCSP cache is updated before the expiry and the response is still valid.

NGINX Ingress controller version:

NGINX Ingress controller
  Release:       v1.7.0
  Build:         72ff21ed9e26cb969052c753633049ba8a87ecf9
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

Kubernetes version:

Client Version: v1.27.2
Kustomize Version: v5.0.1
Server Version: v1.26.6

Environment:

  • Cloud provider or hardware configuration: Azure, AKS

  • OS : Alpine Linux v3.17

  • Kernel : 5.15.0-1042-azure

  • Install tools: -

  • Basic cluster related info: See above

  • How was the ingress-nginx-controller installed: -

  • Current State of the controller: -

  • Current state of ingress object, if applicable: -

  • Others: -

How to reproduce this issue:
(Re-)Start Ingress-Nginx pods and wait until the OCSP response is expired.

Anything else we need to know:
Certificate provider: QuoVadis

It seems like the OSCP response is refreshed some time after the expiry (like a day after the expiry). As we just detected this issue I don't have an exact time so far.

@VWDude VWDude added the kind/bug Categorizes issue or PR as related to a bug. label Nov 8, 2023
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Nov 8, 2023
@VWDude
Copy link
Author

VWDude commented Nov 9, 2023

Update:
It seems like the update is fetched a day and some minutes later than expected:
image

So instead of the expected Nov 8. 5:14:33 GMT,
the answer is fetched on Nov 9. 5:17:00 GMT...

Copy link

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Dec 10, 2023
@VWDude
Copy link
Author

VWDude commented Jan 5, 2024

Any updates here?

@MrWusa
Copy link

MrWusa commented Aug 16, 2024

Would be interesting to know since we are facing the same problem since we use this feature. We are restarting our ingress very regularly because of this which is kind of annoying...

@dkvaltech
Copy link

The issue still persist, we have to manually restart the ingix-ingress controller every couple of hours just to refresh the response. An update on the problem would be much appreaciated!

@longwuyuan
Copy link
Contributor

Do you have any letsencrypt certs ? If yes does the problem occur with the Letsencrypt certs as well ?

@VWDude
Copy link
Author

VWDude commented Sep 24, 2024

At least in our case we're using QuoVadis Certificates

@VWDude
Copy link
Author

VWDude commented Sep 24, 2024

Update: It seems like the update is fetched a day and some minutes later than expected: image

So instead of the expected Nov 8. 5:14:33 GMT, the answer is fetched on Nov 9. 5:17:00 GMT...

After some search in the code, at least from my point of view, it looks like the update of new ocsp responses is hardcoded to 3 days, which fits my observation from above:

local expiry = 3600 * 24 * 3

@longwuyuan
Copy link
Contributor

longwuyuan commented Sep 24, 2024

/triage accepted

@tao12345666333 @rikatz @Gacko @strongjz please comment as it seems that if this is limited to just changing the time period, then it will not be a complicated change.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 24, 2024
@VWDude
Copy link
Author

VWDude commented Sep 24, 2024

/triage accepted

@tao12345666333 @rikatz @Gacko @strongjz please comment as it seems that if this is limited to just changing the time period, then it will not be a complicated change.

It should be the other way (the expiry should be shorter):
For now the ingress is caching the ocsp response for 3 days and after that time it will fetch a new one, regardless if the response itself is already expired or not.
This may be suitable for letsencrypt certificates because they maybe have longer ocsp expiries.
It would be best, if the expiry is calculated dynamically by reading when the ocsp will actually expire and substract a grace time from that (like 5 minutes), so the ocsp response will not have a downtime, while a new one is fetched.

@tao12345666333
Copy link
Member

Sorry for the long delay. Let me take a look this week

/assign

@dkvaltech
Copy link

Sorry for the long delay. Let me take a look this week

/assign

@tao12345666333 could you verify the issue, do you have some insights for us?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests

6 participants