Skip to content

Sending many events concurrently throws serialization errors on v1.129.0rc1 #18349

@tulir

Description

@tulir

Description

The total event counter added in #18260 does not appear to be concurrency-safe. It was already reverted in #18346, but the revert seemed to be for a different reason, so filing this to make sure it's not brought back without both bugs fixed.

Also, since it's a trigger, I'm not sure if the revert actually removes it for servers that already upgraded?

Steps to reproduce

  • run a bot with broadcasting features, such as the xkcd bot, and subscribe it to many rooms
  • wait for a broadcast
  • observe that the broadcast fails halfway through

Homeserver

maunium.net

Synapse Version

1.129.0rc1

Installation Method

Docker (maunium/synapse)

Database

PostgreSQL

Workers

Multiple workers

Relevant log output

2025-04-17 06:34:30,041 - synapse.storage.txn - 788 - WARNING - persist_events-3884- [TXN OPERROR] {persist_events-df2a} could not serialize access due to concurrent update
CONTEXT:  SQL statement "UPDATE event_stats SET total_event_count = total_event_count + 1"
PL/pgSQL function event_stats_increment_counts() line 5 at SQL statement
 1/5
2025-04-17 06:34:30,041 - synapse.storage.txn - 788 - WARNING - persist_events-3882- [TXN OPERROR] {persist_events-df13} could not serialize access due to concurrent update
CONTEXT:  SQL statement "UPDATE event_stats SET total_event_count = total_event_count + 1"
PL/pgSQL function event_stats_increment_counts() line 5 at SQL statement
 4/5
2025-04-17 06:34:30,042 - synapse.access.http.8034 - 508 - INFO - POST-4824- 172.18.2.50 - 8034 - {None} Processed request: 0.058sec/0.002sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 156B 200 "POST /_synapse/replication/send_events/hrIpPAprUy HTTP/1.1" "Synapse/1.129.0rc1" [0 dbevts]
2025-04-17 06:34:30,051 - synapse.storage.txn - 788 - WARNING - persist_events-3882- [TXN OPERROR] {persist_events-df13} could not serialize access due to concurrent update
CONTEXT:  SQL statement "UPDATE event_stats SET total_event_count = total_event_count + 1"
PL/pgSQL function event_stats_increment_counts() line 5 at SQL statement
 5/5
2025-04-17 06:34:30,051 - synapse.storage.txn - 788 - WARNING - persist_events-3884- [TXN OPERROR] {persist_events-df2a} could not serialize access due to concurrent update
CONTEXT:  SQL statement "UPDATE event_stats SET total_event_count = total_event_count + 1"
PL/pgSQL function event_stats_increment_counts() line 5 at SQL statement
 2/5
2025-04-17 06:34:30,052 - synapse.storage.txn - 788 - WARNING - persist_events-3885- [TXN OPERROR] {persist_events-df33} could not serialize access due to concurrent update
CONTEXT:  SQL statement "UPDATE event_stats SET total_event_count = total_event_count + 1"
PL/pgSQL function event_stats_increment_counts() line 5 at SQL statement
 0/5
2025-04-17 06:34:30,062 - synapse.http.server - 146 - ERROR - POST-4823- Failed handle request via 'ReplicationSendEventsRestServlet': <SynapseRequest at 0x7f700b3c2490 method='POST' uri='/_synapse/replication/send_events/qdqPGWuIld' clientproto='HTTP/1.1' site='8034'>
Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/twisted/internet/defer.py", line 2010, in _inlineCallbacks
    result = context.run(
        cast(Failure, result).throwExceptionIntoGenerator, gen
    )
  File "/usr/local/lib/python3.13/site-packages/twisted/python/failure.py", line 549, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
           ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/synapse/util/caches/response_cache.py", line 275, in cb
    return await callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/synapse/replication/http/send_events.py", line 164, in _handle_request
    await self.event_creation_handler.persist_and_notify_client_events(
        requester, events_and_context, ratelimit, extra_users
    )
  File "/usr/local/lib/python3.13/site-packages/synapse/handlers/message.py", line 1942, in persist_and_notify_client_events
    ) = await self._storage_controllers.persistence.persist_events(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        events_and_context,
        ^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/controllers/persist_events.py", line 428, in persist_events
    ret_vals = await yieldable_gather_results(enqueue, partitioned.items())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/synapse/util/async_helpers.py", line 307, in yieldable_gather_results
    raise dfe.subFailure.value from None
  File "/usr/local/lib/python3.13/site-packages/twisted/internet/defer.py", line 2010, in _inlineCallbacks
    result = context.run(
        cast(Failure, result).throwExceptionIntoGenerator, gen
    )
  File "/usr/local/lib/python3.13/site-packages/twisted/python/failure.py", line 549, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
           ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/controllers/persist_events.py", line 423, in enqueue
    return await self._event_persist_queue.add_to_queue(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/controllers/persist_events.py", line 245, in add_to_queue
    res = await make_deferred_yieldable(end_item.deferred.observe())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/controllers/persist_events.py", line 288, in handle_queue_loop
    ret = await self._per_item_callback(room_id, item.task)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/controllers/persist_events.py", line 369, in _process_event_persist_queue_task
    return await self._persist_event_batch(room_id, task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/controllers/persist_events.py", line 646, in _persist_event_batch
    await self.persist_events_store._persist_events_and_state_updates(
    ...<7 lines>...
    )
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/databases/main/events.py", line 348, in _persist_events_and_state_updates
    await self.db_pool.runInteraction(
    ...<9 lines>...
    )
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/database.py", line 953, in runInteraction
    return await delay_cancellation(_runInteraction())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/twisted/internet/defer.py", line 2010, in _inlineCallbacks
    result = context.run(
        cast(Failure, result).throwExceptionIntoGenerator, gen
    )
  File "/usr/local/lib/python3.13/site-packages/twisted/python/failure.py", line 549, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
           ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/database.py", line 919, in _runInteraction
    result: R = await self.runWithConnection(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<11 lines>...
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/database.py", line 1048, in runWithConnection
    return await make_deferred_yieldable(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        self._db_pool.runWithConnection(inner_func, *args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/twisted/python/threadpool.py", line 269, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.13/site-packages/twisted/python/threadpool.py", line 285, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
                                ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        ctx, func, *args, **kw
        ^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/usr/local/lib/python3.13/site-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/database.py", line 1041, in inner_func
    return func(db_conn, *args, **kwargs)
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/database.py", line 781, in new_transaction
    r = func(cursor, *args, **kwargs)
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/databases/main/events.py", line 975, in _persist_events_txn
    self._store_event_txn(txn, events_and_contexts=events_and_contexts)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/databases/main/events.py", line 2525, in _store_event_txn
    self.db_pool.simple_insert_many_txn(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        txn,
        ^^^^
    ...<37 lines>...
        ],
        ^^
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/database.py", line 1195, in simple_insert_many_txn
    txn.execute_values(sql, values, fetch=False)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/database.py", line 416, in execute_values
    return self._do_execute(
           ~~~~~~~~~~~~~~~~^
        # TODO: is it safe for values to be Iterable[Iterable[Any]] here?
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
        values,
        ^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/database.py", line 489, in _do_execute
    return func(sql, *args, **kwargs)
  File "/usr/local/lib/python3.13/site-packages/synapse/storage/database.py", line 419, in <lambda>
    lambda the_sql, the_values: execute_values(
                                ~~~~~~~~~~~~~~^
        self.txn, the_sql, the_values, template=template, fetch=fetch
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ),
    ^
  File "/usr/local/lib/python3.13/site-packages/psycopg2/extras.py", line 1299, in execute_values
    cur.execute(b''.join(parts))
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^
psycopg2.errors.SerializationFailure: could not serialize access due to concurrent update
CONTEXT:  SQL statement "UPDATE event_stats SET total_event_count = total_event_count + 1"
PL/pgSQL function event_stats_increment_counts() line 5 at SQL statement

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions