Skip to content

kobo worker logging thread may hang indefinitely on TLS handshake #269

@kdudka

Description

@kdudka

I am forwarding an issue from Red Hat internal Jira that I was debugging in May 2024 but that I have not resolved yet.

Current Behavior:
An OSH task hanged indefinitely on an OSH worker while the child process was blocked on write to stdout/stderr. The kobo worker logging thread was blocked indeifintely on TLS handshake:

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib64/python3.9/ssl.py", line 1343, in do_handshake
    self._sslobj.do_handshake()
  File "/usr/lib64/python3.9/ssl.py", line 1074, in _create
    self.do_handshake()
  File "/usr/lib64/python3.9/ssl.py", line 501, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib64/python3.9/http/client.py", line 1454, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/usr/lib64/python3.9/http/client.py", line 980, in send
    self.connect()
  File "/usr/lib64/python3.9/http/client.py", line 1040, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.9/http/client.py", line 1280, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.9/xmlrpc/client.py", line 1321, in send_content
    connection.endheaders(request_body)
  File "/usr/lib64/python3.9/xmlrpc/client.py", line 1291, in send_request
    self.send_content(connection, request_body)
  File "/usr/lib/python3.9/site-packages/kobo/xmlrpc.py", line 369, in _single_request3
    h = self.send_request(host, handler, request_body, verbose)
  File "/usr/lib64/python3.9/xmlrpc/client.py", line 1166, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.9/site-packages/kobo/xmlrpc.py", line 477, in request
    result = transport_class.request(self, *args, **kwargs)
  File "/usr/lib64/python3.9/xmlrpc/client.py", line 1464, in __request
    response = self.__transport.request(
  File "/usr/lib64/python3.9/xmlrpc/client.py", line 1122, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python3.9/site-packages/kobo/client/__init__.py", line 510, in upload_task_log
    self._hub.worker.upload_task_log(task_id, remote_file_name, mode, chunk_start, chunk_len, chunk_checksum, encoded_chunk)
  File "/usr/lib/python3.9/site-packages/kobo/worker/logger.py", line 65, in run
    self._hub.upload_task_log(BytesIO(self._send_data), self._task_id, "stdout.log", append=True)
  File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.9/threading.py", line 937, in _bootstrap
    self._bootstrap_inner()

The Python code where the thread was blocked seems to support a timeout to be set for the TLS handshake but the kobo/xmlrpc stack does not set it:

(gdb) py-list
1338            self._check_connected()
1339            timeout = self.gettimeout()
1340            try:
1341                if timeout == 0.0 and block:
1342                    self.settimeout(None)
>1343                self._sslobj.do_handshake()
1344            finally:
1345                self.settimeout(timeout)
1346    
1347        def _real_connect(self, addr, connect_ex):
1348            if self.server_side:

Expected Behavior:
The task should either fail or stop transferring the captured output to the hub but it should not hang indefinitely.

Steps to reproduce:
I am not sure how it happened but I suspect it was caused by an intermittent network issue.

Impact Statement:
Such OSH tasks unnecessarily block the OSH scanning queue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions