Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

certificate not renewing (finished successfully. Ongoing...) #323

Open
svde opened this issue Oct 20, 2023 · 12 comments
Open

certificate not renewing (finished successfully. Ongoing...) #323

svde opened this issue Oct 20, 2023 · 12 comments

Comments

@svde
Copy link

svde commented Oct 20, 2023

On a few of our servers (ubuntu 23.04, apache 2.4.55, mod_md 2.4.19) randon certificates aren't renewing.

example from the acme server status

{
  "name": "***redacted***",
  "domains": [
    "***redacted***"
  ],
  "contacts": [
    "mailto:***redacted***"
  ],
  "transitive": 1,
  "ca": {
    "account": "ACME-.letsencrypt.org-0000",
    "proto": "ACME",
    "url": "https://acme-v02.api.letsencrypt.org/directory",
    "urls": [
      "https://acme-v02.api.letsencrypt.org/directory"
    ],
    "agreement": "accepted"
  },
  "privkey": {
    "type": "RSA",
    "bits": 4096
  },
  "state": 2,
  "renew-mode": 1,
  "renew-window": "33%",
  "warn-window": "10%",
  "must-staple": false,
  "proto": {
    "acme-tls/1": []
  },
  "stapling": true,
  "cert": {
    "rsa": {
      "valid": {
        "from": "Thu, 17 Aug 2023 07:57:18 GMT",
        "until": "Wed, 15 Nov 2023 07:57:17 GMT"
      },
      "serial": "***redacted***",
      "sha256-fingerprint": "***redacted***",
      "ocsp": {
        "status": "good",
        "valid": {
          "from": "Tue, 17 Oct 2023 04:00:00 GMT",
          "until": "Tue, 24 Oct 2023 03:59:58 GMT"
        },
        "renewal": {
          "name": "***redacted***.net",
          "finished": true,
          "notified": false,
          "notified-renewed": false,
          "last-run": "Wed, 18 Oct 2023 23:53:27 GMT",
          "errors": 0,
          "last": {
            "status": 0,
            "detail": "certificate status is GOOD, status valid Tue, 17 Oct 2023 04:00:00 GMT - Tue, 24 Oct 2023 03:59:58 GMT",
            "activity": "status of certid ***redacted***, reading response"
          }
        }
      }
    },
    "valid": {
      "from": "Thu, 17 Aug 2023 07:57:18 GMT",
      "until": "Wed, 15 Nov 2023 07:57:17 GMT"
    }
  },
  "renew-at": "Mon, 16 Oct 2023 15:09:18 GMT",
  "watched": true,
  "renew": true,
  "renewal": {
    "name": "***redacted***.net",
    "finished": true,
    "notified": true,
    "notified-renewed": true,
    "last-run": "Thu, 17 Aug 2023 08:57:12 GMT",
    "valid-from": "Thu, 17 Aug 2023 08:57:19 GMT",
    "errors": 0,
    "last": {
      "status": 0,
      "valid-from": "Thu, 17 Aug 2023 08:57:19 GMT"
    },
    "cert": {}
  }
},

The files in the md directory for this (redacted) fqdn still have timestamps from when the certificate was last renewed. The server status shows "finished successfully. Ongoing...". The only way I've been able to renew them is to remove the directories for the fqdn in the md directory and start over for those fqdn's. Is there a better way to do it?

@icing
Copy link
Owner

icing commented Oct 20, 2023

Do you have files in md/staging/<fqdn> when this happens? Is there a job.json file in there logging what it has done?

@svde
Copy link
Author

svde commented Oct 20, 2023

Most recent entries from the staging job.json. My observation is that it didn't do anything.

{
"name": "redacted.net",
"finished": true,
"notified": true,
"notified-renewed": true,
"last-run": "Thu, 17 Aug 2023 08:57:12 GMT",
"valid-from": "Thu, 17 Aug 2023 08:57:19 GMT",
"errors": 0,
"last": {
"status": 0,
"valid-from": "Thu, 17 Aug 2023 08:57:19 GMT"
},
"log": {
"entries": [
{
"when": "Thu, 17 Aug 2023 08:58:41 GMT",
"type": "message-renewed"
},
{
"when": "Thu, 17 Aug 2023 08:57:19 GMT",
"type": "finished"
},
{
"when": "Thu, 17 Aug 2023 08:57:19 GMT",
"type": "progress",
"detail": "The certificate for the managed domain has been renewed successfully and can be used (valid since Thu, 17 Aug 2023 08:57:19 GMT). A graceful server restart now is recommended."
},

@svde
Copy link
Author

svde commented Oct 22, 2023

Removing the (stale?) fqdn directories in md/stagaing seems to solve the issue.

@mritzmann
Copy link

I had the same problem today. After deleting /etc/apache2/md/staging/<domain> the certificate was successfully renewed. Before that, the LogLevel was increased to md:debug, but I did not find anything suspicious.

[Mon Apr 29 08:44:32.650718 2024] [mpm_event:notice] [pid 957:tid 140277553302464] AH00493: SIGUSR1 received.  Doing graceful restart
[Mon Apr 29 08:44:32.726486 2024] [md:info] [pid 957:tid 140277553302464] AH10071: mod_md (v2.4.25), initializing...
[Mon Apr 29 08:44:32.726690 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(451): AH10037: server seems reachable via http: and reachable via https:
(..)
[Mon Apr 29 08:44:32.726792 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(719): AH10039: Completed MD[sub.example.com, CA=(null), Proto=ACME, Agreement=accepted, renew-mode=1 renew_window=33%, warn_window=10%
(..)
[Mon Apr 29 08:44:32.726857 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(608): AH10041: Server sub.example.com:0 matches md sub.example.com (config sub.example.com[default, default], match-mode=0) for domain sub.example.com, has now 1 MDs
[Mon Apr 29 08:44:32.726861 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(608): AH10041: Server sub.example.com:0 matches md sub.example.com (config sub.example.com[default, default], match-mode=0) for domain sub.example.com, has now 1 MDs
[Mon Apr 29 08:44:32.726876 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(841): sync MDs, start
[Mon Apr 29 08:44:32.726917 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(904): sync MDs, 7 existing, 0 moved, 0 new.
(..)
[Mon Apr 29 08:44:32.732397 2024] [md:debug] [pid 957:tid 140277553302464] md_ocsp.c(322): md[sub.example.com]: priming OCSP status
[Mon Apr 29 08:44:32.732466 2024] [md:debug] [pid 957:tid 140277553302464] md_ocsp.c(361): md[sub.example.com]: adding ocsp info (responder=http://r3.o.lencr.org)
(..)
[Mon Apr 29 08:44:32.749708 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(555): AH10169: sub.example.com: https server_rec for sub.example.com does not have protocol acme-tls/1 enabled
[Mon Apr 29 08:44:32.750013 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(256): md{sub.example.com}: certificate(0) is ok
[Mon Apr 29 08:44:32.750036 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(941): loading md sub.example.com
[Mon Apr 29 08:44:32.750058 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(943): loaded md sub.example.com
[Mon Apr 29 08:44:32.750067 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(987): sync MDs, finish done
(..)
[Mon Apr 29 08:44:32.750466 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(1112): sub.example.com: init done
[Mon Apr 29 08:44:32.752378 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(1006): AH10074: 7 out of 7 mds need watching
[Mon Apr 29 08:44:32.752757 2024] [md:debug] [pid 957:tid 140277553302464] mod_md_drive.c(342): AH10067: register md renew watchdog(_md_renew_)
[Mon Apr 29 08:44:32.752962 2024] [md:debug] [pid 957:tid 140277553302464] mod_md_ocsp.c(266): AH10203: register md ocsp watchdog(_md_ocsp_)
[Mon Apr 29 08:44:32.753127 2024] [mpm_event:notice] [pid 957:tid 140277553302464] AH00489: Apache/2.4.59 (Debian) OpenSSL/3.0.11 mod_qos/11.74 configured -- resuming normal operations
[Mon Apr 29 08:44:32.753139 2024] [core:notice] [pid 957:tid 140277553302464] AH00094: Command line: '/usr/sbin/apache2'
[Mon Apr 29 08:44:32.755303 2024] [md:debug] [pid 777586:tid 140277419341504] mod_md_drive.c(202): AH10054: md watchdog start, auto drive 7 mds
[Mon Apr 29 08:44:32.755337 2024] [md:debug] [pid 777586:tid 140277410948800] mod_md_ocsp.c(158): AH10197: md ocsp watchdog start, ocsp stapling 7 certificates
[Mon Apr 29 08:44:32.855601 2024] [md:debug] [pid 777586:tid 140277419341504] mod_md_drive.c(207): AH10055: md watchdog run, auto drive 7 mds
[Mon Apr 29 08:44:32.855634 2024] [md:debug] [pid 777586:tid 140277410948800] mod_md_ocsp.c(164): AH10198: md ocsp watchdog run, ocsp stapling 7 certificates
(..)
[Mon Apr 29 08:44:32.855716 2024] [md:debug] [pid 777586:tid 140277410948800] md_ocsp.c(901): OCSP status updates due: 0
[Mon Apr 29 08:44:32.855726 2024] [md:debug] [pid 777586:tid 140277410948800] mod_md_ocsp.c(178): AH10199: md ocsp watchdog next run in 59 minutes 59 seconds
(..)
[Mon Apr 29 08:44:32.856613 2024] [md:debug] [pid 777586:tid 140277419341504] mod_md_drive.c(229): AH10107: next run in 11 hours 59 minutes 59 seconds

If it occurs again, what would I have to do to debug it further?

@opsec
Copy link

opsec commented Jun 7, 2024

Had the same problem on FreeBSD 13.2p11, apache 2.4.59, mod_md 2.4.15. The staging/ cleanup works, but it's ugly.

Where do I have to put md:debug in the apache config to get logs of the process ?

@icing
Copy link
Owner

icing commented Jun 7, 2024

Had the same problem on FreeBSD 13.2p11, apache 2.4.59, mod_md 2.4.15. The staging/ cleanup works, but it's ugly.

Where do I have to put md:debug in the apache config to get logs of the process ?

You can add LogLevel md:debug anywhere on the base server level.

@svde
Copy link
Author

svde commented Jul 6, 2024

Our setup involves two webservers behind a load balancer. We use a script called by MDMessageCmd to synchronize (challenges and when done: full apache config) to the other node. In that script the renewed case triggers a check to see if all certificates that are being renewed are done, and only then will it reload the webserver and synchronize the apache configuration (including new certs) to the other host.

My observation is that if there are leftover directories in md/staging, it's mostly from the last one being renewed.

@icing
Copy link
Owner

icing commented Sep 6, 2024

Closed as being stale.

@icing icing closed this as completed Sep 6, 2024
@svde
Copy link
Author

svde commented Sep 6, 2024

Please re-open, it still occurs.

@icing icing reopened this Sep 6, 2024
@icing
Copy link
Owner

icing commented Sep 12, 2024

@svde Do you have any hint that something in mod_md is doing the wrong thing? Or putting it another way: are you looking for professional support for your unique setup or do you want to file a bug report?

@svde
Copy link
Author

svde commented Sep 12, 2024

I have the impression that there is a race condition when updating multiple certificates. For the last one the staging directory isn't cleaned up. I'm not the only one experiencing this.

@icing
Copy link
Owner

icing commented Sep 12, 2024

Ok, just made https://github.com/icing/mod_md/releases/tag/v2.4.28 with a new test case and extra handling for failed staging sets that purges the faulty file set. Also raised the log level, so that managed domains that did not detect a new staging are listed.

If you run with LogLevel md:debug, you should see ': nothing staged" for all your domains or an error when loading the staged set.

What is not clear to me is how your multi-server setup works and how you prevent the instances to confuse each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants