-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
atproto-hub ndb context failures in handle #1315
Comments
Still hasn't started back up again yet. Surprising! |
Hasn't recurred in >3d. Not sure why, but I won't argue. |
Dammit, it started up again. Wish I knew why this was so intermittent! For now I'll restart atproto-hub and see what happens. |
Hasn't recurred since we cut a lot of load with #1329 (comment) . I don't think it's actually fixed, but triggering it may be load related, we may not see it again until we get back to that level of load organically. |
Goddammit, this started up again last night. |
trying this tweak ^, will see. |
Also dropped handle threads in atproto-hub from 100 down to 10. 🤞🤞🤞 |
Such a weird one, I don't understand this at all yet.
When we run more than one handle thread in atproto-hub, we're fine for a while, but after 9-12h we eventually start seeing this:
This is on
Object.get_or_create
:bridgy-fed/atproto_firehose.py
Lines 334 to 342 in 60c92e1
The ndb context is long-lived, outside the handle loop:
bridgy-fed/atproto_firehose.py
Lines 290 to 300 in 60c92e1
So, sure, I buy that ndb contexts may not thread safe enough, even if we're making a different one in each thread and we should be fine. From https://googleapis.dev/python/python-ndb/latest/client.html#google.cloud.ndb.client.Client.context :
(We're not async.)
It's weird that this doesn't happen until 9-12h after the server starts though, right?!
The text was updated successfully, but these errors were encountered: