SSL error with uvloop and already-closed transport #3115

AetherUnbound · 2023-10-02T22:59:38Z

Sentry link

https://openverse.sentry.io/share/issue/c59d64d754414cb2bcf2887c1da580cd/

Description

We received an exception in Sentry for the following:

RuntimeError: unable to perform operation on <TCPTransport closed=True reading=False 0x7efe36b04000>; the handler is closed
  File "uvloop/sslproto.pyx", line 714, in uvloop.loop.SSLProtocol._do_read
  File "uvloop/sslproto.pyx", line 699, in uvloop.loop.SSLProtocol._process_outgoing
  File "uvloop/handles/stream.pyx", line 674, in uvloop.loop.UVStream.write
  File "uvloop/handles/handle.pyx", line 159, in uvloop.loop.UVHandle._ensure_alive

With the message:

Fatal error on SSL protocol
protocol: <uvloop.loop.SSLProtocol object at 0x7efe38213cc0>
transport: <TCPTransport closed=True reading=False 0x7efe36b04000>

9 of these occurred in quick succession on Oct 2, 2023 at 6:02am UTC. They have not come up again.

This seems to me like it might be a worker dying or requests hanging (a large number of images failed to validate their dead link checks immediately before this). It seems like asyncio is involved in some part of the stack here, @sarayourfriend was this similar to any issues you encountered while setting all that up? I know we don't have the ASGI version fully deployed, so I'm a little surprised to see async involved here. I've confirmed that the environment is production though, so unless the Sentry environment is incorrectly configured this is coming from production.

Additional context

Setting this to low priority because it hasn't occurred again and only affected a small number of requests.

The text was updated successfully, but these errors were encountered:

sarayourfriend · 2023-10-03T00:28:59Z

I know we don't have the ASGI version fully deployed, so I'm a little surprised to see async involved here.

To clarify the context here: we use aiohttp, an asyncio HTTP requests library, for dead link filtering. We don't run an ASGI app, that is correct, but we most certainly have been running async code in production for many months now.

This issue doesn't look like anything I ran into while working on the ASGI conversion, but I wouldn't be surprised if this somehow has something to do with aiohttp client session sharing (or not sharing), and not cleanly closing the sessions down. If a worker crashed in the middle of dead link filtering and requests were still open, it most certainly could cause this behaviour. If we can look at the Django logs and see whether there was a worker abort around the same time, that would be helpful.

AetherUnbound · 2023-10-03T00:37:13Z

If a worker crashed in the middle of dead link filtering and requests were still open, it most certainly could cause this behaviour.

Perfect, thanks for helping to confirm (and better clarify) the hunch I had! I'll dive into the logs and see if there's anything which comports this.

AetherUnbound · 2023-10-03T00:59:27Z

Update: There was no other useful logging information around those messages, however they did all occur while the API was doing some heavy iteration on dead link filtering, so I do believe that's the case. I'm not sure what would be the appropriate action here besides potentially handling this specific exception.

sarayourfriend · 2023-10-03T01:29:26Z

If it happens again then more logging in the function so that we know precisely when things failed would be helpful. Otherwise, I have no idea how we would begin to try to handle this special case. Is it in our code? Is it in the environment of our application? Is it in the worker?

sarayourfriend · 2023-10-03T21:41:30Z

One more clarification: this issue previously mentioned uvicorn in the title. We do not yet use uvicorn. This is uvloop, an asyncio implementation (though one that uvicorn depends on). Just clarifying to help focus debugging 🙂

sarayourfriend · 2023-12-01T03:53:56Z

Assigning this to myself to see if this is still happening.

sarayourfriend · 2023-12-04T07:46:41Z

It is still happening. Unclear how to investigate this further though. I'll move it back to the backlog and unassign. @AetherUnbound what do you think the best way to prioritise this kind of thing is? I don't want to leave this issue open forever. If we're "okay" with the error, should we close it as "won't fix"? The issue's been open for a long time now.

AetherUnbound · 2023-12-06T01:54:24Z

What's interesting is, these seem to happen in bursts, with dozens of issues coming in at a single timestamp (across a few different tasks). This makes me think it might be deployment related, but none of the timestamps match up.

That said, I don't think there's much that we can do if there's not a clear source. I've archived this for another thousand occurrences, and I'll close this issue as won't fix.

sarayourfriend · 2023-12-06T02:13:59Z

That is interesting. I wonder if it's something dead-link related specifically too 🤔 Or potentially related to not having that route be truly async. One of the SO answers I read when trying to investigate this suggested that it could be caused by the worker timing out and the loop terminating. Maybe this is just a symptom of that?

We can plan to look into this again after #3449 and see if making our other potentially long-running request (search) asynchronous would prevent uvicorn from thinking it timed out?

AetherUnbound added 🐛 tooling: sentry Sentry issue 💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: api Related to the Django API labels Oct 2, 2023

sarayourfriend changed the title ~~SSL error with uvicorn and already-closed transport~~ SSL error with uvloop and already-closed transport Oct 3, 2023

sarayourfriend self-assigned this Dec 1, 2023

sarayourfriend removed their assignment Dec 4, 2023

AetherUnbound closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSL error with uvloop and already-closed transport #3115

SSL error with uvloop and already-closed transport #3115

AetherUnbound commented Oct 2, 2023 •

edited

Loading

sarayourfriend commented Oct 3, 2023

AetherUnbound commented Oct 3, 2023

AetherUnbound commented Oct 3, 2023

sarayourfriend commented Oct 3, 2023

sarayourfriend commented Oct 3, 2023 •

edited

Loading

sarayourfriend commented Dec 1, 2023

sarayourfriend commented Dec 4, 2023

AetherUnbound commented Dec 6, 2023

sarayourfriend commented Dec 6, 2023

SSL error with uvloop and already-closed transport #3115

SSL error with uvloop and already-closed transport #3115

Comments

AetherUnbound commented Oct 2, 2023 • edited Loading

Sentry link

Description

Additional context

sarayourfriend commented Oct 3, 2023

AetherUnbound commented Oct 3, 2023

AetherUnbound commented Oct 3, 2023

sarayourfriend commented Oct 3, 2023

sarayourfriend commented Oct 3, 2023 • edited Loading

sarayourfriend commented Dec 1, 2023

sarayourfriend commented Dec 4, 2023

AetherUnbound commented Dec 6, 2023

sarayourfriend commented Dec 6, 2023

AetherUnbound commented Oct 2, 2023 •

edited

Loading

sarayourfriend commented Oct 3, 2023 •

edited

Loading