Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Sep 21, 2022

⏩ Moved to #13863


Scratch changes to fix have_seen_event not being invalidated

See #13856

The problem

_invalidate_caches_for_event doesn't run in monolith mode which means we never even tried to clear the have_seen_event and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).

Additionally there is bug with the key being wrong so _invalidate_caches_for_event never invalidates the have_seen_event cache even when it does run.

Dev notes

SYNAPSE_POSTGRES=1 SYNAPSE_TEST_LOG_LEVEL=INFO python -m twisted.trial tests.storage.databases.main.test_events_worker.HaveSeenEventsTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_TEST_LOG_LEVEL=INFO python -m twisted.trial tests.storage.databases.main.test_events_worker.HaveSeenEventsTestCase.test_persisting_event_invalidates_cache

Persist events

persist_events or persist_event or update_current_state (not really)
self._event_persist_queue
_process_event_persist_queue_task
_persist_event_batch
_persist_events_and_state_updates
_persist_events_txn

Replication stuff

https://github.com/matrix-org/synapse/blob/1a1abdda42551dad3aadc04a169c25f4cc651a2c/docs/replication.md

EventsStreamEventRow(BaseEventsStreamRow)
EventsStreamRow

class ReplicationFederationSendEventsRestServlet(ReplicationEndpoint):

class ReplicationSendEventRestServlet(ReplicationEndpoint):
_process_position

on_position
on_rdata

process_replication_rows


get_replication_command_handler

send_federation_ack
send_command

stream_update
RdataCommand

ReplicationStreamer
add_replication_callback

on_new_room_event_args (on_new_room_event)
on_new_event
on_new_replication_data

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

…vent

Fix for
#13856

Fixed by calling `_invalidate_caches_for_event`
when we persist an event.

And an additional fix in `_invalidate_caches_for_event`
to make sure it uses the correct cache key. This seems
like it would be an easy foot-gun for any `tree=True`
cache.

Wrong:
```py
self.have_seen_event.invalidate((room_id, event_id))
```

Correct:
```py
self.have_seen_event.invalidate(((room_id, event_id),))
```
self._invalidate_local_get_event_cache(event_id)
self.have_seen_event.invalidate((room_id, event_id))
self.have_seen_event.invalidate(((room_id, event_id),))
Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This key lookup was wrong and we were never invalidating the have_seen_event cache even over replication.


Yes, the cache key literally a set wrapped in set. Something weird with the TreeCache I assume

ex. (('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)

LruCache cache values before pop {(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),): False}

We should probably check all other instances of this for the same problem. And ideally fix the cache so it uses the expected (room_id, event_id) key tuple instead.

assert event.internal_metadata.stream_ordering

relation = relation_from_event(event)
self.store._invalidate_caches_for_event(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to move because we can't touch the cache objects from this background queue thread processing and persisting the events.

Otherwise we run into ValueError: Cache objects can only be accessed from the main thread

synapse_main | 2022-09-21 20:53:42,362 - synapse.http.server - 123 - ERROR - POST-3 - Failed handle request via 'RoomCreateRestServlet': <SynapseRequest at 0x7f0fde7ec160 method='POST' uri='/_matrix/client/v3/createRoom' clientproto='HTTP/1.0' site='8080'>
synapse_main | Traceback (most recent call last):
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 306, in _async_render_wrapper
synapse_main |     callback_return = await self._async_render(request)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 512, in _async_render
synapse_main |     callback_return = await raw_callback_return
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/rest/client/room.py", line 161, in on_POST
synapse_main |     info, _ = await self._room_creation_handler.create_room(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/room.py", line 909, in create_room
synapse_main |     ) = await self._send_events_for_new_room(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/room.py", line 1105, in _send_events_for_new_room
synapse_main |     await send(etype=EventTypes.Create, content=creation_content)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/room.py", line 1081, in send
synapse_main |     ) = await self.event_creation_handler.create_and_send_nonmember_event(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/message.py", line 990, in create_and_send_nonmember_event
synapse_main |     ev = await self.handle_new_client_event(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/util/metrics.py", line 113, in measured_func
synapse_main |     r = await func(self, *args, **kwargs)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/handlers/message.py", line 1324, in handle_new_client_event
synapse_main |     result, _ = await make_deferred_yieldable(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 287, in handle_queue_loop
synapse_main |     ret = await self._per_item_callback(room_id, item.task)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 359, in _process_event_persist_queue_task
synapse_main |     return await self._persist_event_batch(room_id, task)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/controllers/persist_events.py", line 748, in _persist_event_batch
synapse_main |     await self.persist_events_store._persist_events_and_state_updates(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 851, in _wrapper
synapse_main |     return await func(*args, **kwargs)  # type: ignore[misc]
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 214, in _persist_events_and_state_updates
synapse_main |     await self.db_pool.runInteraction(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 881, in runInteraction
synapse_main |     return await delay_cancellation(_runInteraction())
synapse_main |   File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1692, in _inlineCallbacks
synapse_main |     result = context.run(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator
synapse_main |     return g.throw(self.type, self.value, self.tb)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 848, in _runInteraction
synapse_main |     result = await self.runWithConnection(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 972, in runWithConnection
synapse_main |     return await make_deferred_yieldable(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
synapse_main |     result = inContext.theWork()  # type: ignore[attr-defined]
synapse_main |   File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
synapse_main |     inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
synapse_main |   File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
synapse_main |     return self.currentContext().callWithContext(ctx, func, *args, **kw)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
synapse_main |     return func(*args, **kw)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection
synapse_main |     result = func(conn, *args, **kw)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 965, in inner_func
synapse_main |     return func(db_conn, *args, **kwargs)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 710, in new_transaction
synapse_main |     r = func(cursor, *args, **kwargs)
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 442, in _persist_events_txn
synapse_main |     self.store._invalidate_caches_for_event(
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/storage/databases/main/cache.py", line 237, in _invalidate_caches_for_event
synapse_main |     self.get_latest_event_ids_in_room.invalidate((room_id,))
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/util/caches/descriptors.py", line 370, in <lambda>
synapse_main |     wrapped.invalidate = lambda key: cache.invalidate(key[0])
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/util/caches/deferred_cache.py", line 391, in invalidate
synapse_main |     self.check_thread()
synapse_main |   File "/usr/local/lib/python3.9/site-packages/synapse/util/caches/deferred_cache.py", line 136, in check_thread
synapse_main |     raise ValueError(
synapse_main | ValueError: Cache objects can only be accessed from the main thread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move the cache invalidation to the persist_events and persist_event layer. Is that problematic/flakey if something fails in the middle? 🤔

persist_events or persist_event or update_current_state (not really)
self._event_persist_queue
_process_event_persist_queue_task
_persist_event_batch
_persist_events_and_state_updates
_persist_events_txn

assert event.internal_metadata.stream_ordering
# Invalidate related caches after we persist a new event
relation = relation_from_event(event)
self.main_store._invalidate_caches_for_event(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split out the fixes to #13863

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant