Image conversion often leads to threads stuck at 100% CPU usage #135
Description
I'm running webone 0.16 / 0.17 on Linux, for use with an old iOS 12 device (too old to support WebP, so it proxies through webone for its WEBP-PNG conversion).
Every now and then, I notice the webone process using 100% or 200% CPU as well as a hung magick
process that seems to be doing nothing.
After some investigation I found:
-
It's not the main process but a
.NET ThreadPool
thread (sometimes multiple such threads) using 100% CPU time.The debugger (lldb with dotnet-sos plugin) doesn't show anything useful in clrstack for that thread:
(lldb) sos threads -t 0x2578d (lldb) sos clrstack OS Thread Id: 0x2578d (12) Child SP IP Call Site 00007046C73FFBC0 00005c2c78b96101 [DebuggerU2MCatchHandlerFrame: 00007046c73ffbc0] (lldb) sos clrstack -i Child SP IP Call Site 00007046C73FF780 00005c2c78b96101 [NativeStackFrame] (lldb)
-
The
magick /tmp/convert-xxx.orig.tmp png:-
process is still running (not zombie) and its /proc/PID/stack indicates that it is waiting to write to stdout (blocked onpipe_write
).
After further investigation, it seems that this is what's happening:
-
If a client disconnects without reading the entire (converted) HTTP response – e.g. locking the phone while a page is still loading – then WebOne stops reading from the converter stdout pipe, but does not close it.
-
Pipes have a finite capacity (around 64 kB on Linux). When the converted PNG output is larger than 64k, and since WebOne isn't reading from the pipe anymore, the
magick
process will be forever stuck on a stdoutwrite()
call – using no CPU, just waiting until some buffer space is available. -
And because the converter process keeps running, this WebOne task in
Converter.cs
:while (InputStream.CanRead && !ConvProc.HasExited) { }
...keeps consuming 100% CPU time forever.
The aforementioned while{}
loop looks completely unnecessary to me (and likewise the CheckIdle
loop seems unnecessary), but if you want to keep it, then it shouldn't be a busy-loop (and IMO should have a log message, so that it would become very obvious from logs if the task keeps running for longer than necessary).
while (InputStream.CanRead && !ConvProc.HasExited) {
Log.WriteLine("[{0}] Converter still running...", SourceTmpFile);
Thread.Sleep(5000);
}
(Also: a busy loop steals available CPU from the converter – if someone is running WebOne on a cheap 1-core VPS, then having such a loop without any Sleep() would mean that only 50% of the CPU is available to the convert process; the other 50% is consumed by the loop task.)
Note: Windows has a much larger pipe buffer (multiple MB on Windows vs 64 kB on Linux), so this problem is much more difficult to trigger, whereas on Linux it occurs very often.
To reproduce in Python:
from socket import *
sk = socket(AF_INET, SOCK_STREAM)
sk.connect(("127.0.0.1", 8080))
sk.send(b"GET http://example.com/large.webp HTTP/1.0\r\n"
b"User-Agent: Opera/10\r\n\r\n")
# don't recv
sk.close()
Looks like this in logs:
21.07.2024 22:01:55.207+1538 Got a connection.
21.07.2024 22:01:55.207+70686 Got a request.
21.07.2024 22:01:55.207+411624 >GET http://example.com/large.webp (127.0.0.1)
21.07.2024 22:01:55.207+575491 Begin process.
21.07.2024 22:01:55.207+716487 >Downloading content (connecting)...
21.07.2024 22:01:55.207+5601477 >Downloading content (receiving)...
21.07.2024 22:01:55.207+5677399 Add response header: Content-Type: image/png
21.07.2024 22:01:55.207+5681741 200 OK. Body 619K of image/webp [Wants magick].
21.07.2024 22:01:55.207+5799530 [/tmp/convert-1150286203.orig.tmp] Converting: magick /tmp/convert-1150286203.orig.tmp png:- ...
21.07.2024 22:01:55.207+5800876 [/tmp/convert-1150286203.orig.tmp] Reading stdout...
21.07.2024 22:01:55.207+5808786 [/tmp/convert-1150286203.orig.tmp reader] Task started
21.07.2024 22:01:55.207+5814408 [/tmp/convert-1150286203.orig.tmp CanRead] Task started
21.07.2024 22:01:55.207+5812850 [/tmp/convert-1150286203.orig.tmp CheckIdle] Task started
21.07.2024 22:01:55.207+5809875 <Send image/png stream.
21.07.2024 22:01:55.207+5809072 [/tmp/convert-1150286203.orig.tmp reader] Waiting for finish of converting...
21.07.2024 22:01:55.207+5814763 [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+13565059 <Stream not sent: Unable to write data to the transport connection: Broken pipe.
21.07.2024 22:01:55.207+13575461 End process.
21.07.2024 22:01:55.207+13575627 <Done (connection close).
21.07.2024 22:01:55.207+30816748 [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+55818992 [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+80821118 [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+105823217 [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+130825206 [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+155829576 [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable