CUPS 2.4 appears to freeze indefinitely under Production load #1259
Unanswered
tbigby-kristin
asked this question in
General
Replies: 1 comment 17 replies
-
|
Sorry, forgot to say in relation to #1128, we do see: "Unable to encrypt connection: Error in the pull function" and also "Unable to encrypt connection: Error in the push function" From time to time in the logs. However, these don't always appear to be when CUPS freezes, they can happen any time without a bad effect. That said it's not unusual for there to be an "Error in the pull function" within 1 minute or so of the freeze. We did also get the "Error in the pull function" message in the old CUPS 1.4 server logs too, which never seemed to have any bad effect. |
Beta Was this translation helpful? Give feedback.
17 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi team,
Unfortunately we have an issue similar to #1128 but as @vliaskov notes, the CUPS server becomes unresponsive indefinitely... in our case up to an hour before we manually restarted it.
Not sure whether to add this as a new issue or add to #1128.
In our case, we are using CUPS as a Linux print server for the school, Windows and Mac students and staff send their print jobs to the CUPS server, we use PaperCut MF's CUPS print provider to track billing, and then CUPS sends the print job to the printer (Canon photocopiers in our environment).
We recently went live with CUPS 2.4.11 and began to notice this issue, so we upgraded to 2.4.12 (specifically this commit: ac85dcb plus my fix for #1249), and are still noticing the freeze. We run CUPS in a Docker container on Ubuntu 24.04 - not using the Ubuntu CUPS packages though but our own custom build.
Before rolling out 2.4.11, we had a CUPS 1.4 server running on CentOS 6 (quite the upgrade!), but that did not have this freezing issue.
I added a Docker Healthcheck to detect when this occurs and to auto-restart the container if it occurs, to limit the impact on users. The healthcheck runs inside the container and is just running a
curlcommand tohttps://127.0.0.1:631/printers.Essentially when the freeze occurs, CUPS stops responding or accepting HTTP connections - we can't open the web UI, print jobs fail to go through, and the Docker healthcheck curl command fails with a timeout. However, I can see that the CUPS process is still running at the time.
Our logs of these restarts show that the freeze occurs about once a day, but only during working hours when there is reasonable printing load, it never occurs overnight when no-one is printing. For that matter, it never occurs on our Staging container, that can stay running for days without any issue.
As it only occurs rarely throughout the day, it's been hard to track down any useful log information, but when I did turn CUPS to Debug logging and added an
error_logcapture, the last few lines before a freeze show a Client being accepted, but there is no "Connection now encrypted" message as would be normal for a new Client:In the past 5 minutes of logs, that client IP had just run 'IPP Get-Jobs' requests for the two printer queues it has installed, which had all been successful.
There's an awful lot in the Debug
error_logfile that would need sanitising before uploading, given that it only occurs in Production, but let me know if there's specific debug lines to check for and I can collect those log details.The Debug Report ran 4 seconds before the end of the log and showed:
Our
MaxClientsis set to1000and during the 5 mins of theerror_log, it fluctuates between 200 and 350 clients.Before we started the 'auto-restart' Healthcheck, we
docker execd into the container while it was frozen.The CUPS process was in Sleeping state:
Netstat showed most connections to CUPS in
CLOSE_WAITstatus, which I understand means that the end-user's device requested to close the connection, but CUPS hadn't processed that yet. There are a fewESTABLISHEDconnections in which the 'Receive Queue' has data that hasn't been processed by CUPS yet.(Notes: I have sanitised the individual client IPs on the
10.0.0.0network but they are different IP addresses. I have also removed connections from PaperCut on port 9191 which are not relevant to CUPS. And I have provided only a sample of the connections, there are 371 connections that refer tocupsd)cupsctlshowed this:The scheduler appeared to be running but not responding to HTTP, via these two commands:
Due to running CUPS in the Docker container, we run it in the foreground with a
cupsd -F &command, in case that's relevant.My suspicion is that somehow, CUPS scheduler loop in
main.chas gotten stuck such that no new requests get accepted, and no existing connections are read from, based on thenetstatdata showing lots of 'Receive Queue' data.But I can't find any obvious place where that could occur.
I'm hesitant to try to run CUPS in production on 'Debug2' logging level, but if that's all we can do to find out more about the problem, I can try it.
Otherwise I'm thinking to try adding some other Debug messages to the
main.cscheduler and start trying to narrow down what code-path is getting stuck.Since we've got the auto-restart Healthcheck in place, the problem isn't urgent for us, but it can still take up to a minute for the Healthcheck to restart CUPS so it would be good to find a fix.
Do you have any advice on where to look next?
Beta Was this translation helpful? Give feedback.
All reactions