Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL error with uvloop and already-closed transport #3115

Closed
AetherUnbound opened this issue Oct 2, 2023 · 9 comments
Closed

SSL error with uvloop and already-closed transport #3115

AetherUnbound opened this issue Oct 2, 2023 · 9 comments
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: api Related to the Django API 🐛 tooling: sentry Sentry issue

Comments

@AetherUnbound
Copy link
Contributor

AetherUnbound commented Oct 2, 2023

Sentry link

https://openverse.sentry.io/share/issue/c59d64d754414cb2bcf2887c1da580cd/

Description

We received an exception in Sentry for the following:

RuntimeError: unable to perform operation on <TCPTransport closed=True reading=False 0x7efe36b04000>; the handler is closed
  File "uvloop/sslproto.pyx", line 714, in uvloop.loop.SSLProtocol._do_read
  File "uvloop/sslproto.pyx", line 699, in uvloop.loop.SSLProtocol._process_outgoing
  File "uvloop/handles/stream.pyx", line 674, in uvloop.loop.UVStream.write
  File "uvloop/handles/handle.pyx", line 159, in uvloop.loop.UVHandle._ensure_alive

With the message:

Fatal error on SSL protocol
protocol: <uvloop.loop.SSLProtocol object at 0x7efe38213cc0>
transport: <TCPTransport closed=True reading=False 0x7efe36b04000>

9 of these occurred in quick succession on Oct 2, 2023 at 6:02am UTC. They have not come up again.

This seems to me like it might be a worker dying or requests hanging (a large number of images failed to validate their dead link checks immediately before this). It seems like asyncio is involved in some part of the stack here, @sarayourfriend was this similar to any issues you encountered while setting all that up? I know we don't have the ASGI version fully deployed, so I'm a little surprised to see async involved here. I've confirmed that the environment is production though, so unless the Sentry environment is incorrectly configured this is coming from production.

Additional context

Setting this to low priority because it hasn't occurred again and only affected a small number of requests.

@AetherUnbound AetherUnbound added 🐛 tooling: sentry Sentry issue 💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: api Related to the Django API labels Oct 2, 2023
@sarayourfriend
Copy link
Contributor

I know we don't have the ASGI version fully deployed, so I'm a little surprised to see async involved here.

To clarify the context here: we use aiohttp, an asyncio HTTP requests library, for dead link filtering. We don't run an ASGI app, that is correct, but we most certainly have been running async code in production for many months now.

This issue doesn't look like anything I ran into while working on the ASGI conversion, but I wouldn't be surprised if this somehow has something to do with aiohttp client session sharing (or not sharing), and not cleanly closing the sessions down. If a worker crashed in the middle of dead link filtering and requests were still open, it most certainly could cause this behaviour. If we can look at the Django logs and see whether there was a worker abort around the same time, that would be helpful.

@AetherUnbound
Copy link
Contributor Author

If a worker crashed in the middle of dead link filtering and requests were still open, it most certainly could cause this behaviour.

Perfect, thanks for helping to confirm (and better clarify) the hunch I had! I'll dive into the logs and see if there's anything which comports this.

@AetherUnbound
Copy link
Contributor Author

Update: There was no other useful logging information around those messages, however they did all occur while the API was doing some heavy iteration on dead link filtering, so I do believe that's the case. I'm not sure what would be the appropriate action here besides potentially handling this specific exception.

@sarayourfriend
Copy link
Contributor

If it happens again then more logging in the function so that we know precisely when things failed would be helpful. Otherwise, I have no idea how we would begin to try to handle this special case. Is it in our code? Is it in the environment of our application? Is it in the worker?

@sarayourfriend sarayourfriend changed the title SSL error with uvicorn and already-closed transport SSL error with uvloop and already-closed transport Oct 3, 2023
@sarayourfriend
Copy link
Contributor

sarayourfriend commented Oct 3, 2023

One more clarification: this issue previously mentioned uvicorn in the title. We do not yet use uvicorn. This is uvloop, an asyncio implementation (though one that uvicorn depends on). Just clarifying to help focus debugging 🙂

@sarayourfriend
Copy link
Contributor

Assigning this to myself to see if this is still happening.

@sarayourfriend sarayourfriend self-assigned this Dec 1, 2023
@sarayourfriend
Copy link
Contributor

It is still happening. Unclear how to investigate this further though. I'll move it back to the backlog and unassign. @AetherUnbound what do you think the best way to prioritise this kind of thing is? I don't want to leave this issue open forever. If we're "okay" with the error, should we close it as "won't fix"? The issue's been open for a long time now.

@sarayourfriend sarayourfriend removed their assignment Dec 4, 2023
@AetherUnbound
Copy link
Contributor Author

What's interesting is, these seem to happen in bursts, with dozens of issues coming in at a single timestamp (across a few different tasks). This makes me think it might be deployment related, but none of the timestamps match up.

That said, I don't think there's much that we can do if there's not a clear source. I've archived this for another thousand occurrences, and I'll close this issue as won't fix.

@AetherUnbound AetherUnbound closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2023
@sarayourfriend
Copy link
Contributor

That is interesting. I wonder if it's something dead-link related specifically too 🤔 Or potentially related to not having that route be truly async. One of the SO answers I read when trying to investigate this suggested that it could be caused by the worker timing out and the loop terminating. Maybe this is just a symptom of that?

We can plan to look into this again after #3449 and see if making our other potentially long-running request (search) asynchronous would prevent uvicorn from thinking it timed out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: api Related to the Django API 🐛 tooling: sentry Sentry issue
Projects
Archived in project
Development

No branches or pull requests

2 participants