Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Improve backfill robustness by trying more servers. #13890

Merged
merged 3 commits into from
Sep 29, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13890.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve backfill robustness by trying more servers when we get a `4xx` error back.
33 changes: 31 additions & 2 deletions synapse/handlers/federation.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,15 @@ async def _maybe_backfill_inner(

async def try_backfill(domains: Collection[str]) -> bool:
# TODO: Should we try multiple of these at a time?

# Number of contacted remote homeservers that have denied our backfill
# request with a 4xx code.
denied_count = 0

# Maximum number of contacted remote homeservers that can deny our
# backfill request with 4xx codes before we give up.
max_denied_count = 5

for dom in domains:
# We don't want to ask our own server for information we don't have
if dom == self.server_name:
Expand All @@ -427,13 +436,33 @@ async def try_backfill(domains: Collection[str]) -> bool:
continue
except HttpResponseException as e:
if 400 <= e.code < 500:
raise e.to_synapse_error()
logger.warning(
"Backfill denied from %s because %s [%d/%d]",
dom,
e,
denied_count,
max_denied_count,
)
denied_count += 1
if denied_count >= max_denied_count:
return False
Comment on lines +447 to +448
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return False and not backfilling makes the most sense to me 👍

vs throwing a 4xx or 5xx

This topic was mentioned in backend internal.


Ideally for a client expecting to see messages in their /messages response we could indicate that there is a gap (MSC3871)

continue

logger.info("Failed to backfill from %s because %s", dom, e)
continue
except CodeMessageException as e:
if 400 <= e.code < 500:
raise
logger.warning(
"Backfill denied from %s because %s [%d/%d]",
dom,
e,
denied_count,
max_denied_count,
)
denied_count += 1
if denied_count >= max_denied_count:
return False
continue

logger.info("Failed to backfill from %s because %s", dom, e)
continue
Expand Down