Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Synapse potentially cached a failed sync request #9507

Closed
Half-Shot opened this issue Feb 27, 2021 · 6 comments
Closed

Synapse potentially cached a failed sync request #9507

Half-Shot opened this issue Feb 27, 2021 · 6 comments
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Regression Something broke which worked on a previous release X-Release-Blocker Must be resolved before making a release

Comments

@Half-Shot
Copy link
Collaborator

Running 4a32600

Last night my database suffered an issue which meant that it was briefly unavailable, and it dropped all connections for a period. Synapse seemed to largely recover as sync requests to some accounts are working, and federation/event creation seems to be working fine.

2021-02-27 13:10:17,830 - synapse.util.caches.response_cache - 191 - INFO - GET-84929- [sync]: using incomplete cached result for [(@Half-Shot:half-shot.uk, 0, 's44209228_293425298_29572_27619593_543941_1697_969558_19742008_52', '183', False, 'HEWQINNMGI')]
2021-02-27 13:16:43,781 - synapse.access.http.8018 - 299 - INFO - GET-84929- 86.16.94.189 - 8018 - {@Half-Shot:half-shot.uk} Processed request: 384.703sec/1.250sec (0.252sec, 0.000sec) (0.000sec/0.000sec/0) 1877428B 200 "GET /_matrix/client/r0/sync?filter=183&timeout=0&since=s44209228_293425298_29572_27619593_543941_1697_969558_19742008_52 HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.7.21 Chrome/85.0.4183.121 Electron/10.2.0 Safari/537.36" [0 dbevts]

(ignore the absolutely awful sync duration, it's sadly a new normal atm)

However, sync requests for my Element android device fail with a 500 each time:

2021-02-27 13:18:55,369 - synapse.util.caches.response_cache - 189 - INFO - GET-85063- [sync]: using completed cached result for [(@Half-Shot:half-shot.uk, 0, None, '181', False, 'DHNXFYGPRT')]
2021-02-27 13:18:55,369 - twisted - 257 - CRITICAL - GET-85063- Unhandled error in Deferred:
2021-02-27 13:18:55,370 - twisted - 257 - CRITICAL - GET-85063- 
Traceback (most recent call last):
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 532, in new_transaction
    r = func(cursor, *args, **kwargs)
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/databases/state/bg_updates.py", line 121, in _get_state_groups_from_groups_txn
    txn.execute(sql % (where_clause,), args)
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 294, in execute
    self._do_execute(self.txn.execute, sql, *args)
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 320, in _do_execute
    return func(sql, *args)
psycopg2.OperationalError: SSL SYSCALL error: EOF detected


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/opt/synapse/lib/python3.9/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/synapse/lib/python3.9/site-packages/synapse/handlers/sync.py", line 313, in _wait_for_sync_for_user
    result = await self.current_sync_for_user(
  File "/opt/synapse/lib/python3.9/site-packages/synapse/handlers/sync.py", line 344, in current_sync_for_user
    return await self.generate_sync_result(sync_config, since_token, full_state)
  File "/opt/synapse/lib/python3.9/site-packages/synapse/handlers/sync.py", line 1000, in generate_sync_result
    res = await self._generate_sync_entry_for_rooms(
  File "/opt/synapse/lib/python3.9/site-packages/synapse/handlers/sync.py", line 1444, in _generate_sync_entry_for_rooms
    await concurrently_execute(handle_room_entries, room_entries, 10)
  File "/opt/synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/opt/synapse/lib/python3.9/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/synapse/lib/python3.9/site-packages/synapse/util/async_helpers.py", line 174, in _concurrently_execute_inner
    await maybe_awaitable(func(next(it)))
  File "/opt/synapse/lib/python3.9/site-packages/synapse/handlers/sync.py", line 1432, in handle_room_entries
    res = await self._generate_room_entry(
  File "/opt/synapse/lib/python3.9/site-packages/synapse/handlers/sync.py", line 1829, in _generate_room_entry
    batch = await self._load_filtered_recents(
  File "/opt/synapse/lib/python3.9/site-packages/synapse/handlers/sync.py", line 516, in _load_filtered_recents
    loaded_recents = await filter_events_for_client(
  File "/opt/synapse/lib/python3.9/site-packages/synapse/visibility.py", line 84, in filter_events_for_client
    event_id_to_state = await storage.state.get_state_for_events(
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/state.py", line 467, in get_state_for_events
    group_to_state = await self.stores.state._get_state_for_groups(
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/databases/state/store.py", line 257, in _get_state_for_groups
    group_to_state_dict = await self._get_state_groups_from_groups(
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/databases/state/store.py", line 162, in _get_state_groups_from_groups
    res = await self.db_pool.runInteraction(
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 660, in runInteraction
    result = await self.runWithConnection(
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 743, in runWithConnection
    return await make_deferred_yieldable(
  File "/opt/synapse/lib/python3.9/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/opt/synapse/lib/python3.9/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/opt/synapse/lib/python3.9/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/opt/synapse/lib/python3.9/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "/opt/synapse/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 306, in _runWithConnection
    compat.reraise(excValue, excTraceback)
  File "/opt/synapse/lib/python3.9/site-packages/twisted/python/compat.py", line 464, in reraise
    raise exception.with_traceback(traceback)
  File "/opt/synapse/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 297, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 738, in inner_func
    return func(db_conn, *args, **kwargs)
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 548, in new_transaction
    conn.rollback()
  File "/opt/synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 160, in rollback
    self.conn.rollback()
  File "/opt/synapse/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 70, in rollback
    raise ConnectionLost()
twisted.enterprise.adbapi.ConnectionLost

2021-02-27 13:18:55,679 - synapse.http.server - 90 - ERROR - GET-85063- Failed handle request via 'SyncRestServlet': <XForwardedForRequest at 0x7ffb2e2e55e0 method='GET' uri='/_matrix/client/r0/sync?filter=181&set_presence=offline&timeout=0' clientproto='HTTP/1.0' site=8018>
2021-02-27 13:18:55,703 - synapse.access.http.8018 - 299 - INFO - GET-85063- 86.16.94.189 - 8018 - {@Half-Shot:half-shot.uk} Processed request: 0.336sec/-0.000sec (0.009sec, 0.004sec) (0.000sec/0.000sec/0) 55B 500 "GET /_matrix/client/r0/sync?filter=181&set_presence=offline&timeout=0 HTTP/1.0" "Element/1.1.0 (Linux; U; Android 11; Pixel 3a Build/RQ1A.210105.00; Flavour GooglePlay; MatrixAndroidSDK_X 0.0.1)" [0 dbevts]

Given that I am 100% certain that my other devices are actively getting new events and hitting the DB successfully on the same process, I am confused as to why the Element android device might be seeing this over and over. It feels like the failure was cached, especially due to the quick response time.

@clokep
Copy link
Member

clokep commented Mar 1, 2021

Maybe a regression from #9358?

@Half-Shot
Copy link
Collaborator Author

That's my thinking yea

@clokep
Copy link
Member

clokep commented Mar 1, 2021

@ShadowJonathan / @anoadragon453 any idea if #9358 could have caused this?

@clokep clokep added S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Regression Something broke which worked on a previous release labels Mar 1, 2021
@ShadowJonathan
Copy link
Contributor

It shouldn't, or else there's something about deferreds which i don't know what it caches a Failure like that.

@ShadowJonathan
Copy link
Contributor

ShadowJonathan commented Mar 1, 2021

I think i know how this issue could be caused, @Half-Shot, can you grep your log for an AttributeError? (Nevermind, according to this snippet, no errors that occur here are ever logged)

In any case, yes, #9358 to some degree caused this problem, but i'm making a fix right now

@clokep clokep added the X-Release-Blocker Must be resolved before making a release label Mar 2, 2021
clokep added a commit that referenced this issue Mar 2, 2021
)"

This reverts commit f5c93fc.

This is being backed out due to a regression (#9507) and additional
review feedback being provided.
@clokep
Copy link
Member

clokep commented Mar 2, 2021

I've backed out the commit causing the regression (see #9358 (comment)), so I'm going to close this in lieu of the original issue #8518.

@clokep clokep closed this as completed Mar 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Regression Something broke which worked on a previous release X-Release-Blocker Must be resolved before making a release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants