Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qualys security scanner kills all cheroot worker threads, hangs server, eats all available file descriptors #365

Closed
wants to merge 4 commits into from

Conversation

DangerPoodle
Copy link

@DangerPoodle DangerPoodle commented Mar 11, 2021

❓ What kind of change does this PR introduce?

  • 🐞 bug fix
  • 🐣 feature
  • πŸ“‹ docs update
  • πŸ“‹ tests/coverage improvement
  • πŸ“‹ refactoring
  • πŸ’₯ other

πŸ“‹ What is the related issue number (starting with #)

N/A

❓ What is the current behavior? (You can also link to an open issue here)

This was originally discovered by scanning our cherrypy-based server with the Qualys vulnerability scanner. We later discovered we could reproduce the issue by:

  • logging into our server on firefox
  • shutting down the server
  • regenerating the server cert (self-signed)
  • restarting the server
  • clicking Refresh on firefox without accepting the new cert.

The result is this exception logged:

03/Mar/2021 18:55:49.533 ENGINE socket.error 1
Traceback (most recent call last):
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/server.py", line 1288, in communicate
    req.parse_request()
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/server.py", line 706, in parse_request
    success = self.read_request_line()
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/server.py", line 747, in read_request_line
    request_line = self.rfile.readline()
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/server.py", line 304, in readline
    data = self.rfile.readline(256)
  File "/usr/lib64/python3.6/_pyio.py", line 511, in readline
    b = self.read(nreadahead())
  File "/usr/lib64/python3.6/_pyio.py", line 495, in nreadahead
    readahead = self.peek(1)
  File "/usr/lib64/python3.6/_pyio.py", line 1063, in peek
    return self._peek_unlocked(size)
  File "/usr/lib64/python3.6/_pyio.py", line 1070, in _peek_unlocked
    current = self.raw.read(to_read)
  File "/usr/lib64/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib64/python3.6/ssl.py", line 971, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib64/python3.6/ssl.py", line 833, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib64/python3.6/ssl.py", line 590, in read
    v = self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2334)

(Note that I am using a locally-built version of cheroot with added logging to diagnose the error, but the same issue occurs with stock cheroot 8.5.2. Unfortunately, this does mean the line numbers might not match exactly.)

This exception is followed by the death of the worker thread executing this request. Hitting refresh as many times as the number of worker threads causes the server to become unresponsive. However, it will continue to receive and queue requests, until the queued requests consume all available file descriptors. Here is what dynpool will log once all worker threads are dead:

03/Mar/2021 19:17:23.011  Thread pool: [current=10/idle=10/queue=221]
03/Mar/2021 19:17:30.019  Thread pool: [current=10/idle=10/queue=222]
03/Mar/2021 19:17:36.027  Thread pool: [current=10/idle=10/queue=223]
03/Mar/2021 19:17:37.029  Thread pool: [current=10/idle=10/queue=224]
03/Mar/2021 19:17:41.034  Thread pool: [current=10/idle=10/queue=225]
...

Note that, although the worker threads are dead, cheroot and dynpool think they are still alive. Calling ThreadPool._clear_dead_threads() would clear the dead threads and allow dynpool to recover, but clear_dead_threads is only called from shrink(), and dynpool won't call shrink() because the number of worker threads is within limits.

The above exception is being caught and handled properly in conn.communicate(), but the worker thread is being killed by a second SSLError exception:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/workers/threadpool.py", line 133, in run
    keep_conn_open = conn.communicate()
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/server.py", line 1324, in communicate
    self._conditional_error(req, '500 Internal Server Error')
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/server.py", line 1371, in _conditional_error
    req.simple_response(response)
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/server.py", line 1124, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/makefile.py", line 36, in write
    self._flush_unlocked()
  File "/root/.pex/installed_wheels/e41a25c4dd3432ad8f36767b07aff8afbe8b9c6a/cheroot-8.5.3.dev2+g473c546a.d20210303-py2.py3-none-any.whl/cheroot/makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "/usr/lib64/python3.6/socket.py", line 604, in write
    return self._sock.send(b)
  File "/usr/lib64/python3.6/ssl.py", line 903, in send
    return self._sslobj.write(data)
  File "/usr/lib64/python3.6/ssl.py", line 601, in write
    return self._sslobj.write(data)
ssl.SSLError: [SSL: UNKNOWN_STATE] unknown state (_ssl.c:2187)

❓ What is the new behavior (if this is a feature change)?

With the proposed change, the SSL exceptions will still be logged, but the worker thread will survive.

πŸ“‹ Other information:

This is my first foray into cheroot, so it's possible this isn't the right way to fix this. We can discuss, and I can retest with alternate fixes.

Both the server and the firefox browser used to trigger the problem are running on CentOS 7. The server is using python 3.6.

πŸ“‹ Checklist:

  • I think the code is well written
  • I wrote good commit messages
  • I have squashed related commits together after the changes have been approved
  • Unit tests for the changes exist
  • Integration tests for the changes exist (if applicable)
  • I used the same coding conventions as the rest of the project
  • The new code doesn't generate linter offenses
  • Documentation reflects the changes
  • The PR relates to only one subject with a clear title
    and description in grammatically correct, complete sentences

This change is Reviewable

@siris
Copy link

siris commented Mar 7, 2022

I can confirm this exact same issue with cheroot version 8.6.0. I am experiencing the same exact scenario as described by @DangerPoodle.

A Qualsys scanner probing a SSL CherryPy server that uses the cherrypy.server.ssl_module = 'builtin' SSL module configuration causes Cherry workers threads (by default Cherry starts with 10 work threads in its thread pool) to exit and eventually become exhausted, destabilizing the entire CherryPy SSL application server.

Here is the traceback I am seeing caused by a Qualsys scan when probing a SSL CherryPy server:

Exception in thread ('CP Server Thread-22',):
Traceback (most recent call last):
  File "cheroot/server.py", line 1290, in communicate
    req.parse_request()
  File "cheroot/server.py", line 719, in parse_request
    success = self.read_request_line()
  File "cheroot/server.py", line 760, in read_request_line
    request_line = self.rfile.readline()
  File "cheroot/server.py", line 310, in readline
    data = self.rfile.readline(256)
  File "/usr/lib/python3.7/_pyio.py", line 512, in readline
    b = self.read(nreadahead())
  File "/usr/lib/python3.7/_pyio.py", line 491, in nreadahead
    readahead = self.peek(1)
  File "/usr/lib/python3.7/_pyio.py", line 1085, in peek
    return self._peek_unlocked(size)
  File "/usr/lib/python3.7/_pyio.py", line 1092, in _peek_unlocked
    current = self.raw.read(to_read)
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2570)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "cheroot/workers/threadpool.py", line 125, in run
    keep_conn_open = conn.communicate()
  File "cheroot/server.py", line 1319, in communicate
    self._conditional_error(req, '500 Internal Server Error')
  File "cheroot/server.py", line 1362, in _conditional_error
    req.simple_response(response)
  File "cheroot/server.py", line 1128, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "cheroot/makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "cheroot/makefile.py", line 36, in write
    self._flush_unlocked()
  File "cheroot/makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "/usr/lib/python3.7/socket.py", line 607, in write
    return self._sock.send(b)
  File "/usr/lib/python3.7/ssl.py", line 1003, in send
    return self._sslobj.write(data)
ssl.SSLError: [SSL: UNKNOWN_STATE] unknown state (_ssl.c:2423)

Can someone from the cheroot maintainer team please take a look at this? This is a fairly critical bug and a simple security scanner should not be able to crash a cheroot powered SSL CherryPy server.

@siris
Copy link

siris commented Apr 6, 2022

@webknjaz can you please take a look at this issue? t has been open for over a year now and it is fairly critical as it cripples a cheroot running as part of a SSL CherryPy server.

I have also confirmed in a production deployment of CherryPy that @DangerPoodle's patch has from this PR works and mitigates the worker thread exhaustion caused by malformed SSL connections.

Additional related cheroot issues:

@TokenRing
Copy link

Hi @DangerPoodle - can I confirm with you that in order for the fix in your PR to work, we have to also enable the error_log with

cherrypy.log.access_log.propagate = True
cherrypy.log.error_log.propagate = True

by default on my side they are set to False. I tried implementing your PR while just leaving those settings above at default, False, and the problem continued to occur. So I am thinking I will set them to True, since your code

self.server.error_log(f'ignored extra SSLError {repr(ex)}')

seems to require the error_log (at least!) to be turned on.

@Wissperwind
Copy link

Wissperwind commented Feb 7, 2023

Hi,
What is the current state?
I might have a similar problem:

[22/Jan/2023:08:43:24] ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1823, in serve
    self._connections.run(self.expiration_interval)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\connections.py", line 203, in run
    self._run(expiration_interval)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\connections.py", line 246, in _run
    new_conn = self._from_server_socket(self.server.socket)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\connections.py", line 300, in _from_server_socket
    s, ssl_env = self.server.ssl_adapter.wrap(s)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\ssl\builtin.py", line 277, in wrap
    s = self.context.wrap_socket(
  File "ssl.py", line 512, in wrap_socket
  File "ssl.py", line 1070, in _create
  File "ssl.py", line 1341, in do_handshake
ssl.SSLError: [SSL: UNEXPECTED_MESSAGE] unexpected message (_ssl.c:997)

[22/Jan/2023:08:43:46] ENGINE socket.error 8
Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1290, in communicate
    req.parse_request()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 719, in parse_request
    success = self.read_request_line()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 778, in read_request_line
    self.simple_response(
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1128, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "socket.py", line 723, in write
  File "ssl.py", line 1205, in send
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)

Exception in thread ('CP Server Thread-16',):
Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1290, in communicate
    req.parse_request()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 719, in parse_request
    success = self.read_request_line()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 778, in read_request_line
    self.simple_response(
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1128, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "socket.py", line 723, in write
  File "ssl.py", line 1205, in send
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "threading.py", line 1009, in _bootstrap_inner
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\workers\threadpool.py", line 125, in run
    keep_conn_open = conn.communicate()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1319, in communicate
    self._conditional_error(req, '500 Internal Server Error')
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1362, in _conditional_error
    req.simple_response(response)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1128, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "socket.py", line 723, in write
  File "ssl.py", line 1205, in send
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)
[22/Jan/2023:08:43:47] ENGINE socket.error 8
Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1290, in communicate
    req.parse_request()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 719, in parse_request
    success = self.read_request_line()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 778, in read_request_line
    self.simple_response(
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1128, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "socket.py", line 723, in write
  File "ssl.py", line 1205, in send
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)

Exception in thread ('CP Server Thread-14',):
Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1290, in communicate
    req.parse_request()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 719, in parse_request
    success = self.read_request_line()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 778, in read_request_line
    self.simple_response(
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1128, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "socket.py", line 723, in write
  File "ssl.py", line 1205, in send
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "threading.py", line 1009, in _bootstrap_inner
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\workers\threadpool.py", line 125, in run
    keep_conn_open = conn.communicate()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1319, in communicate
    self._conditional_error(req, '500 Internal Server Error')
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1362, in _conditional_error
    req.simple_response(response)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1128, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "socket.py", line 723, in write
  File "ssl.py", line 1205, in send
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)
[22/Jan/2023:08:46:11] ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1823, in serve
    self._connections.run(self.expiration_interval)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\connections.py", line 203, in run
    self._run(expiration_interval)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\connections.py", line 246, in _run
    new_conn = self._from_server_socket(self.server.socket)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\connections.py", line 300, in _from_server_socket
    s, ssl_env = self.server.ssl_adapter.wrap(s)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\ssl\builtin.py", line 277, in wrap
    s = self.context.wrap_socket(
  File "ssl.py", line 512, in wrap_socket
  File "ssl.py", line 1070, in _create
  File "ssl.py", line 1341, in do_handshake
ssl.SSLError: [SSL] internal error (_ssl.c:997)

[22/Jan/2023:09:59:48] ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1823, in serve
    self._connections.run(self.expiration_interval)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\connections.py", line 203, in run
    self._run(expiration_interval)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\connections.py", line 246, in _run
    new_conn = self._from_server_socket(self.server.socket)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\connections.py", line 300, in _from_server_socket
    s, ssl_env = self.server.ssl_adapter.wrap(s)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\ssl\builtin.py", line 277, in wrap
    s = self.context.wrap_socket(
  File "ssl.py", line 512, in wrap_socket
  File "ssl.py", line 1070, in _create
  File "ssl.py", line 1341, in do_handshake
ssl.SSLError: [SSL] internal error (_ssl.c:997)

[22/Jan/2023:09:59:59] ENGINE socket.error 8
Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 987, in read_request_headers
    self.header_reader(self.rfile, self.inheaders)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 213, in __call__
    raise ValueError('Illegal end of headers.')
ValueError: Illegal end of headers.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1290, in communicate
    req.parse_request()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 732, in parse_request
    success = self.read_request_headers()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 989, in read_request_headers
    self.simple_response('400 Bad Request', ex.args[0])
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1128, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "socket.py", line 723, in write
  File "ssl.py", line 1205, in send
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)

Exception in thread ('CP Server Thread-8',):
Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 987, in read_request_headers
    self.header_reader(self.rfile, self.inheaders)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 213, in __call__
    raise ValueError('Illegal end of headers.')
ValueError: Illegal end of headers.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1290, in communicate
    req.parse_request()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 732, in parse_request
    success = self.read_request_headers()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 989, in read_request_headers
    self.simple_response('400 Bad Request', ex.args[0])
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1128, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "socket.py", line 723, in write
  File "ssl.py", line 1205, in send
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "threading.py", line 1009, in _bootstrap_inner
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\workers\threadpool.py", line 125, in run
    keep_conn_open = conn.communicate()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1319, in communicate
    self._conditional_error(req, '500 Internal Server Error')
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1362, in _conditional_error
    req.simple_response(response)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\server.py", line 1128, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "D:\some\folder\python-windows\Lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "socket.py", line 723, in write
  File "ssl.py", line 1205, in send
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)

@mzhukovs
Copy link

mzhukovs commented Apr 23, 2023

Ran into the same issue recently - until this PR hopefully gets merged and prevents this from happening has anyone confirmed that just updating the same file to catch the SSLError actually works?

nvm I see @siris has stated that it does

@rkachach

This comment was marked as duplicate.

@liquidaty

This comment was marked as duplicate.

@Kodiologist
Copy link

Kodiologist commented Feb 1, 2024

Here's a monkey patch based on this PR that's worked for me. You can put it in your own code to modify cheroot without changing cheroot's files.

# Work around HTTPS issues.
# https://github.com/cherrypy/cheroot/pull/365

import cheroot.workers.threadpool, time, ssl
from cheroot.workers.threadpool import _SHUTDOWNREQUEST

def monkeypatched_run(self):
    """Process incoming HTTP connections.

    Retrieves incoming connections from thread pool.
    """
    self.server.stats['Worker Threads'][self.name] = self.stats
    try:
        self.ready = True
        while True:
            conn = self.server.requests.get()
            if conn is _SHUTDOWNREQUEST:
                return

            self.conn = conn
            is_stats_enabled = self.server.stats['Enabled']
            if is_stats_enabled:
                self.start_time = time.time()
            keep_conn_open = False
            try:
                keep_conn_open = conn.communicate()
            except ssl.SSLError as ex:
                self.server.error_log(f'ignored extra SSLError {repr(ex)}')
            finally:
                if keep_conn_open:
                    self.server.put_conn(conn)
                else:
                    conn.close()
                if is_stats_enabled:
                    self.requests_seen += self.conn.requests_seen
                    self.bytes_read += self.conn.rfile.bytes_read
                    self.bytes_written += self.conn.wfile.bytes_written
                    self.work_time += time.time() - self.start_time
                    self.start_time = None
                self.conn = None
    except (KeyboardInterrupt, SystemExit) as ex:
        self.server.interrupt = ex

cheroot.workers.threadpool.WorkerThread.run = monkeypatched_run

@Kodiologist
Copy link

@webknjaz Does anything need to happen here to get this merged? It's a three-year-old, three-line change that fixes a showstopping bug.

@webknjaz
Copy link
Member

webknjaz commented Feb 1, 2024

@Kodiologist I had something locally that helped me reproduce this with tlsfuzzer, but other priorities took over. I'll need to look deeper myself. #518 fixed other related problems that I thought might have influenced this one, but now I think that's orthogonal. Ideally, it'd be useful to have some tlsfuzzer-based testing in CI, but I won't declare that a blocker. Still a regression test for this specific bug would be important.

I was holding off on this because I thought that this might not be the correct place to change. I'll try to find more time to look into it again.

@Kodiologist
Copy link

Cool, thank you.

@liquidaty
Copy link

FWIW we would much rather have a change asap that is in the wrong place, but patches the issue (even if only partially), than no change at all.

It can always be put in the "right" place later.

There are several issues related to SSL, of which this is only one, that have been known bugs for many years, that have continues to stagnate, and which are making us increasingly uncomfortable over time because of, among other concerns, just the doubt it leaves about the level of commitment the maintainers have to production-grade quality. We would very much appreciate if these longstanding and critical bugs could be given more attention, in particular where the community has already put in the work to provide a solution.

@rkachach
Copy link

rkachach commented Feb 2, 2024

Just to add that we are in the same situation here: several users end up disabling ssl because of all the issues cherrypy suffers which obviously affects their systems security when they are forced to switch to plain http. I hope the maintainers of this project can provide a patch/solution to this BUG because of its severity.

@webknjaz
Copy link
Member

FTR I decided to go with suggestions made in #358.

webknjaz added a commit to webknjaz/cheroot that referenced this pull request Mar 12, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this pull request Mar 13, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this pull request Mar 13, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this pull request Mar 21, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this pull request Mar 22, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this pull request Mar 23, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this pull request Mar 23, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
@webknjaz webknjaz closed this in 7b34907 Mar 31, 2024
webknjaz added a commit that referenced this pull request Mar 31, 2024
A DoS would happen in many situations, including TLS errors and
attempts to close the underlying sockets erroring out.

This patch aims to prevent a situation when the worker threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all.

PR #649

Fixes #358
Fixes #354

Ref #310
Ref #346
Ref #375
Ref #599
Ref #641

Resolves #365
@liquidaty
Copy link

It's great that this fix is now in the master/main branch-- thank you. But is it possible to use this fix on v10.0.0, without having to move to an unstable branch?

Given how critical this fix is I wanted to apply just this change as a patch to the latest release version (10.0.0 / https://github.com/cherrypy/cheroot/tree/v10.0.0). However, it seems that threadpool.py file has changed quite a bit from v10.0.0 so applying this fix as a patch (e.g. https://github.com/cherrypy/cheroot/pull/649/files#diff-7e1452f6721288523eb718a38ef7ddf3cc98854f53606f0c02057b72dbd6c5d9) is not straightforward (for example, the line if conn is _SHUTDOWNREQUEST: no longer exists in threadpool.py, so it's not clear if that handling would need to be re-inserted, and/or has moved to some other file that also needs to be updated).

Is there any safe way to apply this critical change, by itself, to v10.0.0? Alternatively, would it be possible to create a v10.0.0-patched branch that only includes critical releases for v10.0.0, so that users of cheroot don't have to wait for the next full release in order to make use of critical fixes such as this one?

@webknjaz @rkachach @Kodiologist @Wissperwind @siris @DangerPoodle

@webknjaz
Copy link
Member

webknjaz commented Apr 2, 2024

without having to move to an unstable branch?

I'd argue that it's more stable than that tag, given the improved evergreen CI and extended test coverage compared to that version. You can always pin to a Git commit SHA if you want it right now.

the line if conn is _SHUTDOWNREQUEST: no longer exists in threadpool.py

It's still there: a72385c#diff-7e1452f6721288523eb718a38ef7ddf3cc98854f53606f0c02057b72dbd6c5d9R129. I moved the nested logic from one method into another, dedicated one. This helps look at the abstraction layers more sanely. I think that this amount of nesting was one of the factors that allowed such a bug to sneak in as reviewing highly nested changes involves tremendous mental overhead so it's more prone to human error.

Is there any safe way to apply this critical change, by itself

There are actually a few places with related bugs fixed. It doesn't make sense to retrieve only this big and keep other bugs in.

would it be possible to create a v10.0.0-patched branch that only includes critical releases for v10.0.0, so that users of cheroot don't have to wait for the next full release in order to make use of critical fixes such as this one?

We don't have a process for this, though the automation might be able to handle it, theoretically, if someone were to figure out backporting all the necessary bits (which to me is rather pointless waste of time).

I'm doing some maintenance in the repo related to the changelog management, it's nearly complete. I also want to add a few more CI jobs and declare official support for Python 3.12. After that, I should be able to trigger the release automation.

This is going to be the fastest way to release, anyway, since the automation is there (I'll need to double check a few things, though, since it's been a while; something probably needs to be adjusted) and manual non-transparent releases haven't been a thing for the past decade.

@Wissperwind
Copy link

I have no problem updating cherrypy. In which version is / will be this update included?

@webknjaz
Copy link
Member

webknjaz commented Apr 3, 2024

Should be 11.

@rkachach
Copy link

rkachach commented Apr 3, 2024

We are also open to update our cherrypy to a version that contains the fix

@liquidaty
Copy link

I also want to add a few more CI jobs and declare official support for Python 3.12. After that, I should be able to trigger the release automation

I'd prefer this approach as well. Would you have any rough idea of what kind of timing range that might take? Not asking in order to hold you to it, but rather so that we can assess whether we think we should bother considering doing an out-of-release patch in the meantime (I am guessing we won't want to go that way, but if you were to say e.g. >= 6 mo then that may change things).

Also-- can we help to get the next release out faster? We have limited resources to spare (manpower and/or $), but are open to contributing, if there is a way to do so without getting in the way.

@webknjaz
Copy link
Member

webknjaz commented Apr 3, 2024

@liquidaty thanks for the offer! I think that working with contributors at this point would slow this down because I'd need to coordinate and communicate a lot more than just doing what I already have in my mind.

I wanted to write a few words on the state of things and the ideas suggested.

Release planning πŸ“…

I'll need to go through the changes and open pull requests to see if anything can be closed with the recent changes and recorded properly in the change log. #654 is mandatory for this, by the way β€” it won't change anything for the end-users functionally but is going to be vital maintenance-wise.

It is already one of my top priorities (next to my normal work commitments; actually other projects in the company want this released too). So I'm planning to cut a new release as soon as the pipeline is functional, if no other blockers emerge. It mostly needs synchronizing bits of automation I normally use in other projects with some adjustments.

Helping out πŸ’”

I'm in a position where I have access to way more projects than time/energy which I sometimes try to communicate when context switching. Here's the most recent explanation example: aio-libs/multidict#887 (comment). This is the primary reason for the delays β€” it's hard to keep up and my time+energy available aren't limitless, sadly.

Long-term, though, having extra people capable of looking out for the project / cutting releases would be useful. Still, I'm not sure how the practices would have to change in the post xz-fiasco world...

If there's people interested in getting more involved β€” I'm happy show the ropes, once the release is out.

Funding πŸ’°

As for $$$ β€” both of the active maintainers are enrolled on Tidelift and have things like GitHub Sponsors/Liberapay/etc set up. People are free to show their appreciation through those. However, personally, I'd be grateful if instead those funds were donated to https://savelife.in.ua/en/donate-en/#donate-army-card-monthly, https://u24.gov.ua or one of many more charities listed here β€” doing so supports people like me who do open source in more ways than one can imagine.

A workaround 🩹

This should get you more or less what I'll release later. It's a reproducible solution since it's pinned to a specific Git commit SHA which is immutable. And it won't cause build problems since it's pure-python.

And more importantly β€” this is something people can use right now. Oh, and this way it's possible to get some early feedback if it starts causing issues (it shouldn't!).

$ pip install git+https://github.com/cherrypy/cheroot.git@1d28a0703dbf6e6f7fa137e6d2f72b99d0bb0377

@webknjaz
Copy link
Member

@rkachach so I just published a release candidate (11.0.0rc0). It should be the same as the previous betas functionally, except for the changelog bit. Let me know if you want to test it out first β€” if not, I'll probably click the release button again and type in the stable version number tomorrow'ish (seeing that it's already past the midnight, I'd say I mean today...). The process is rather quick and completes under 10 minutes (https://github.com/cherrypy/cheroot/actions/runs/8713203309) β€” so I'll only hold it off for longer if somebody explicitly asks for more time to test.

@webknjaz
Copy link
Member

webknjaz commented Apr 17, 2024

UPDATE FOR EVERYONE FOLLOWING THIS DISCUSSION: @rkachach and I have been talking through the work channels and we identified that the Ceph project is still running Python 3.6 and can't migrate to a newer one fast enough. As a result I made an effort to see if it'd be difficult to make a release supporting that runtime and managed to revert the incompatible changes in a separate branch.

I've just released v10.0.1rc0 about 10 minutes ago with everything that was included in v11.0.0rc0 but the Python 3.6 drop reverted. The plan is that Ceph folks make some testing tomorrow and I'll cut a stable release as v10.0.1.

This is a one-time exception and Python 3.6/3.7 remain dropped from the main branch and untested. There is no plan to maintain a branch for backports but if there's a reasonable need, I can consider this (there's some bits of related automation in place already), provided that external contributors would be the ones sending the backport PRs as it's not feasible for the Cheroot maintainers to sustain more maintenance burden.

This was referenced Apr 17, 2024
@rkachach
Copy link

@webknjaz thank you very much for all the big effort and the time you spent to get this done. Really appreciated. Just to followup on the discussion I wanted to confirm that the downgrade of typing_extensions package version to 3.7.4.3 fixes the issue reported on python/typing_extensions#10

@webknjaz
Copy link
Member

UPD: Due to the need of more testing, the stable release is being postponed until Monday.

@webknjaz
Copy link
Member

I released v10.0.1 finally. Here's a few places on the internet that you may be interested in with regard to what's included:

I think I'll release v11 series with mostly maintenance changes after some more work is done later. That'll finally contain changes that mark Python 3.6/3.7 as unsupported in the metadata. And perhaps better logging around broken TLS connections.

P.S. I'm going to attend PyCon US 2024 in like 3 weeks, so if any of you is around β€” come say hi and I'm happy to hack on the CherryPy things if anybody's up for it during the Sprints (otherwise, I'll probably be mostly involved with things relating to packaging and various mass-maintenance automation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants