Skip to content

Image conversion often leads to threads stuck at 100% CPU usage #135

Closed
@grawity

Description

I'm running webone 0.16 / 0.17 on Linux, for use with an old iOS 12 device (too old to support WebP, so it proxies through webone for its WEBP-PNG conversion).

Every now and then, I notice the webone process using 100% or 200% CPU as well as a hung magick process that seems to be doing nothing.

After some investigation I found:

  • It's not the main process but a .NET ThreadPool thread (sometimes multiple such threads) using 100% CPU time.

    The debugger (lldb with dotnet-sos plugin) doesn't show anything useful in clrstack for that thread:

    (lldb) sos threads -t 0x2578d
    (lldb) sos clrstack
    OS Thread Id: 0x2578d (12)
            Child SP               IP Call Site
    00007046C73FFBC0 00005c2c78b96101 [DebuggerU2MCatchHandlerFrame: 00007046c73ffbc0]
    (lldb) sos clrstack -i
    Child SP         IP               Call Site
    00007046C73FF780 00005c2c78b96101 [NativeStackFrame]
    (lldb)
    
  • The magick /tmp/convert-xxx.orig.tmp png:- process is still running (not zombie) and its /proc/PID/stack indicates that it is waiting to write to stdout (blocked on pipe_write).

After further investigation, it seems that this is what's happening:

  • If a client disconnects without reading the entire (converted) HTTP response – e.g. locking the phone while a page is still loading – then WebOne stops reading from the converter stdout pipe, but does not close it.

  • Pipes have a finite capacity (around 64 kB on Linux). When the converted PNG output is larger than 64k, and since WebOne isn't reading from the pipe anymore, the magick process will be forever stuck on a stdout write() call – using no CPU, just waiting until some buffer space is available.

  • And because the converter process keeps running, this WebOne task in Converter.cs:

    while (InputStream.CanRead && !ConvProc.HasExited) { }

    ...keeps consuming 100% CPU time forever.

The aforementioned while{} loop looks completely unnecessary to me (and likewise the CheckIdle loop seems unnecessary), but if you want to keep it, then it shouldn't be a busy-loop (and IMO should have a log message, so that it would become very obvious from logs if the task keeps running for longer than necessary).

while (InputStream.CanRead && !ConvProc.HasExited) {
    Log.WriteLine("[{0}] Converter still running...", SourceTmpFile);
    Thread.Sleep(5000);
}

(Also: a busy loop steals available CPU from the converter – if someone is running WebOne on a cheap 1-core VPS, then having such a loop without any Sleep() would mean that only 50% of the CPU is available to the convert process; the other 50% is consumed by the loop task.)


Note: Windows has a much larger pipe buffer (multiple MB on Windows vs 64 kB on Linux), so this problem is much more difficult to trigger, whereas on Linux it occurs very often.

To reproduce in Python:

from socket import *
sk = socket(AF_INET, SOCK_STREAM)
sk.connect(("127.0.0.1", 8080))
sk.send(b"GET http://example.com/large.webp HTTP/1.0\r\n"
        b"User-Agent: Opera/10\r\n\r\n")
# don't recv
sk.close()

Looks like this in logs:

21.07.2024 22:01:55.207+1538    Got a connection.
21.07.2024 22:01:55.207+70686   Got a request.
21.07.2024 22:01:55.207+411624  >GET http://example.com/large.webp (127.0.0.1)
21.07.2024 22:01:55.207+575491   Begin process.
21.07.2024 22:01:55.207+716487  >Downloading content (connecting)...
21.07.2024 22:01:55.207+5601477 >Downloading content (receiving)...
21.07.2024 22:01:55.207+5677399  Add response header: Content-Type: image/png
21.07.2024 22:01:55.207+5681741  200 OK. Body 619K of image/webp [Wants magick].
21.07.2024 22:01:55.207+5799530 [/tmp/convert-1150286203.orig.tmp] Converting: magick /tmp/convert-1150286203.orig.tmp  png:- ...
21.07.2024 22:01:55.207+5800876 [/tmp/convert-1150286203.orig.tmp] Reading stdout...
21.07.2024 22:01:55.207+5808786 [/tmp/convert-1150286203.orig.tmp reader] Task started
21.07.2024 22:01:55.207+5814408 [/tmp/convert-1150286203.orig.tmp CanRead] Task started
21.07.2024 22:01:55.207+5812850 [/tmp/convert-1150286203.orig.tmp CheckIdle] Task started
21.07.2024 22:01:55.207+5809875 <Send image/png stream.
21.07.2024 22:01:55.207+5809072 [/tmp/convert-1150286203.orig.tmp reader] Waiting for finish of converting...
21.07.2024 22:01:55.207+5814763 [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+13565059        <Stream not sent: Unable to write data to the transport connection: Broken pipe.
21.07.2024 22:01:55.207+13575461         End process.
21.07.2024 22:01:55.207+13575627        <Done (connection close).
21.07.2024 22:01:55.207+30816748        [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+55818992        [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+80821118        [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+105823217       [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+130825206       [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable
21.07.2024 22:01:55.207+155829576       [/tmp/convert-1150286203.orig.tmp CanRead] InputStream is still readable

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingformat convertingIssues related to file format convertinggood first issueGood for newcomers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions