feat: make Telethon flood-waits visible in the log#124
Conversation
Telethon's ``TelegramClient`` silently sleeps through any FloodWaitError whose wait is ≤ ``flood_sleep_threshold`` (default 60s). On a large ``iter_messages`` pass (e.g. a 100k-message chat) Telegram often hands back much longer waits, and the default behaviour makes the run look like a hang: no log lines, no DB writes, realtime listener still firing. The operator can't tell "crashed" from "rate-limited". This change makes flood-waits explicit: 1. Set ``flood_sleep_threshold=0`` on the shared ``TelegramClient`` so every FloodWaitError bubbles up as an exception rather than a silent sleep. 2. Introduce ``iter_messages_with_flood_retry(client, entity, **kwargs)`` — a thin async generator that wraps ``client.iter_messages``, logs ``FloodWait: sleeping Ns before resuming at message_id=M``, sleeps, then resumes iteration from the last-seen message id. 3. Use that wrapper at the two bulk-iteration sites in ``src/telegram_backup.py`` (full chat backup and gap-fill). Config: ``FLOOD_WAIT_LOG_THRESHOLD`` (default 10s) controls the minimum wait to log — trivial 1-2s floods stay quiet. Before: multi-minute flood-wait looks like a crash. After: operator sees ``FloodWait: sleeping 47s before resuming`` and knows exactly what's happening. Regression tests in ``tests/test_flood_wait_visibility.py``: - Wrapper resumes iteration from the last-seen message id after a flood. - Short floods below the log threshold are suppressed. - Long floods emit a structured log line with wait duration + cursor.
📝 WalkthroughWalkthroughChanges introduce flood-wait handling for Telegram message iteration by always setting Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hey @tondeaf, excellent feature! Making flood-waits visible is a huge UX improvement for operators — "is it crashed or rate-limited?" is a really common pain point with Telethon. The Here's the thorough review: Code Quality (1 CRITICAL, 2 MEDIUM, 2 LOW)CRITICAL — MAX_FLOOD_RETRIES = 5
retries = 0
while True:
try:
async for msg in client.iter_messages(entity, min_id=resume_from, **kwargs):
yield msg
if getattr(msg, "id", None) is not None:
resume_from = max(resume_from, msg.id)
retries = 0 # reset on successful progress
return
except FloodWaitError as e:
retries += 1
if retries > MAX_FLOOD_RETRIES:
logger.error("FloodWait: exceeded %d retries, giving up (last_msg_id=%s)", MAX_FLOOD_RETRIES, resume_from)
raise
logger.warning(...)
await asyncio.sleep(e.seconds + 1)MEDIUM — PR body advertises MEDIUM — MAX_FLOOD_WAIT_SECONDS = 600 # 10 minutes
wait_seconds = min(e.seconds + 1, MAX_FLOOD_WAIT_SECONDS)
await asyncio.sleep(wait_seconds)LOW — Two extra blank lines introduced LOW — Security (APPROVE with above changes)
Test Adequacy (2 HIGH gaps, 2 MEDIUM)HIGH — No test for FloodWait before any message is yielded HIGH — No test for multiple consecutive FloodWaitErrors MEDIUM — No test for MEDIUM — CodeRabbitCodeRabbit was rate-limited on this PR. I'll retrigger it once the cooldown expires. Thanks for the contribution! 🙏 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
tests/test_flood_wait_visibility.py (1)
26-44:autouse=Trueis overkill and module-reload is a footgun.This fixture reloads
src.connectionandsrc.telegram_backupfor every test in the file — includingtest_config_kwargs_include_flood_sleep_threshold_zerowhich only touchessrc.configand doesn't need the DB stub at all. Reloading modules mid-test-suite can also leak state into unrelated tests viasys.modulesmutation if this file is imported alongside others.Narrow the scope: drop
autouse=Trueand apply the fixture explicitly to the two tests that actually exercisesrc.telegram_backup/src.connection. The teardown branch at lines 42-44 is also effectively a no-op (theifis always true since the fixture just inserted the module) — either restore the originalsys.modules["src.db"]or remove the branch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_flood_wait_visibility.py` around lines 26 - 44, The _fake_db fixture is too broad and the module reload teardown is unsafe; remove autouse=True from the _fake_db fixture and instead add it as an explicit argument to only the tests that need DB stubbing (the tests that import/use src.telegram_backup and src.connection), and replace the teardown no-op with restoring the prior sys.modules["src.db"] value (capture original = sys.modules.get("src.db") at fixture start and on teardown set sys.modules["src.db"] = original or delete it if original was None) so the fixture (_fake_db), monkeypatch.setitem, and the importlib.reload calls no longer leak state into unrelated tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/telegram_backup.py`:
- Around line 697-699: There are two instances of double blank lines inside
function bodies (one immediately after the statement "msg_data = await
self._process_message(message, chat_id)" and another in the later block around
similar message-processing logic); remove the extra blank line in each place so
there is at most one blank line between logical blocks per PEP 8 — update the
function(s) containing "_process_message" calls to collapse the consecutive
blank lines to a single blank line and re-run ruff format/check to ensure the
change passes linting.
- Around line 41-62: iter_messages_with_flood_retry currently retries forever,
sleeps for unbounded server-supplied seconds, doesn't honour
FLOOD_WAIT_LOG_THRESHOLD, and lacks type hints; fix by adding type hints for
parameters and return (e.g., AsyncIterator[Message] or appropriate Telethon
types), introduce a bounded retry counter (e.g., retry_count and MAX_RETRIES)
that increments on each FloodWaitError and is reset to 0 whenever resume_from
advances (i.e., after yielding a message), clamp the sleep to
MAX_FLOOD_WAIT_SECONDS (use min(e.seconds, MAX_FLOOD_WAIT_SECONDS) + 1) and only
log the FloodWait if e.seconds >= FLOOD_WAIT_LOG_THRESHOLD, and raise or abort
after exceeding MAX_RETRIES to avoid infinite loops (refer to
iter_messages_with_flood_retry, resume_from, e.seconds,
FLOOD_WAIT_LOG_THRESHOLD, MAX_FLOOD_WAIT_SECONDS, and retry_count/MAX_RETRIES).
In `@tests/test_flood_wait_visibility.py`:
- Around line 96-142: Remove the dead fake_iter definition (it’s overwritten by
seeded_iter) and add focused tests for iter_messages_with_flood_retry: (1) a
test where the first call to the fake iter_messages raises FloodWaitError before
yielding any message (ensure min_id stays 0 and subsequent call yields
messages), (2) a test that simulates consecutive FloodWaitError occurrences
before success to exercise retry-count logic, and (3) a test that captures the
kwargs passed into the fake iter_messages to assert that max_id (and other
kwargs) are preserved across retries (use SimpleNamespace client.iter_messages,
patch asyncio.sleep, and assert behavior against iter_messages_with_flood_retry
and the _fill_gap_range path).
- Around line 1-11: The module docstring uses a cryptic internal ID
"(yrru-mix3)"; replace that token in the top-of-file docstring with the actual
PR or issue reference (e.g. "PR `#1234`" or "Issue `#5678`") so git-blame/searchers
can find the context—edit the string in tests/test_flood_wait_visibility.py (the
file-level docstring surrounding "Flood-wait visibility") to include the real
PR/issue number instead of "(yrru-mix3)".
---
Nitpick comments:
In `@tests/test_flood_wait_visibility.py`:
- Around line 26-44: The _fake_db fixture is too broad and the module reload
teardown is unsafe; remove autouse=True from the _fake_db fixture and instead
add it as an explicit argument to only the tests that need DB stubbing (the
tests that import/use src.telegram_backup and src.connection), and replace the
teardown no-op with restoring the prior sys.modules["src.db"] value (capture
original = sys.modules.get("src.db") at fixture start and on teardown set
sys.modules["src.db"] = original or delete it if original was None) so the
fixture (_fake_db), monkeypatch.setitem, and the importlib.reload calls no
longer leak state into unrelated tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9fb4dc56-f287-4704-ad25-fd93d6825d37
📒 Files selected for processing (4)
src/config.pysrc/telegram_backup.pytests/test_config.pytests/test_flood_wait_visibility.py
Address PR GeiserX#124 review feedback. Source changes (src/telegram_backup.py): - Add MAX_FLOOD_RETRIES=5 cap. The previous `while True` would loop forever on a single chat if Telegram kept rate-limiting (e.g. account restriction), blocking every later chat in the backup. The counter resets on every successful yield, so a long backfill that hits one flood-wait per chunk doesn't trip the cap; only consecutive waits without progress count. - Add MAX_FLOOD_WAIT_SECONDS=600 sleep cap. Defensive against a pathologically large e.seconds value (buggy/malicious server) that would otherwise pin the process for hours. - Add FLOOD_WAIT_LOG_THRESHOLD env var (default 10s) so routine short waits don't spam the log; long waits still surface. Set to 0 to log every wait. Implements the knob the PR description advertised. - Drop two extra blank lines at telegram_backup.py:698 and :790 (PEP 8 inside-function-body limit). Test changes (tests/test_flood_wait_visibility.py): - New: flood-before-any-yield (resume_from stays at original min_id). - New: three consecutive flood-waits before success. - New: counter-reset-on-progress (covers the fix above directly). - New: gives-up-after-max-retries with FloodWaitError raised. - New: e.seconds clamped at MAX_FLOOD_WAIT_SECONDS. - New: max_id kwarg preservation across retries (gap-fill regression). - New: short waits suppressed by FLOOD_WAIT_LOG_THRESHOLD. - Removed dead `fake_iter` from the original resume test. - Replaced opaque (yrru-mix3) ticket reference with PR GeiserX#124. - Narrowed the autouse `_fake_db` fixture into an opt-in `fake_db` fixture so test_config_kwargs_include_flood_sleep_threshold_zero no longer pays the import-reload cost it didn't need. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed your feedback in the latest commit. Summary: CRITICAL — unbounded retry loop
MEDIUM —
MEDIUM —
LOW — extra blank lines at
LOW — opaque
HIGH (tests) — flood before any yield
HIGH (tests) — multiple consecutive floods
MEDIUM (tests) —
MEDIUM (tests) — overly broad
Bonus tests
Ready for another look. |
|
Thanks for the contribution, @tondeaf! Merged your PR and shipped a follow-up in #130 (v7.6.2) that hardens the retry logic further:
Released as v7.6.2. |
Summary
flood_sleep_threshold=0on the sharedTelegramClientso everyFloodWaitErrorsurfaces, then wrapclient.iter_messagesat the two bulk call-sites insrc/telegram_backup.pywithiter_messages_with_flood_retry, which logs the wait, sleeps, and resumes from the last yielded id.MAX_FLOOD_RETRIES=5consecutive flood-waits without progress (counter resets on each yield) before the wrapper gives up and re-raises;MAX_FLOOD_WAIT_SECONDS=600clamp one.secondsto defend against pathological values.FLOOD_WAIT_LOG_THRESHOLDenv var (default 10s) suppresses noisy short-wait log lines while keeping the operator-visible long ones.Type of Change
Database Changes
scripts/Data Consistency Checklist
chat_idvalues use marked format (via_get_marked_id())_strip_tz()before DB operationsTesting
python -m pytest tests/ -v)ruff check .)ruff format --check .)Security Checklist
e.secondsis integer from Telethon and is clamped before useDeployment Notes
FLOOD_WAIT_LOG_THRESHOLD(default10, set to0to log every wait). No-op for existing deployments — the defaults match the prior log-everything behavior except for sub-10-second waits which are now silent.The problem
Telethon's
TelegramClientsilently sleeps through anyFloodWaitErrorwhose wait is ≤flood_sleep_threshold(default 60 s). On a largeiter_messagespass — e.g. initial backup of a 100 k-message chat — Telegram often hands back much longer waits, and the operator can't tell "crashed" from "rate-limited":systemctl statusstill shows the service as activeThe fix
flood_sleep_threshold=0on the sharedTelegramClientso everyFloodWaitErrorbubbles up as an exception rather than a silent sleep.iter_messages_with_flood_retry(client, entity, **kwargs)— a thin async generator that wrapsclient.iter_messages, logsFloodWait: sleeping Ns before resuming (last_msg_id=M, retry=N/MAX), sleeps, then resumes iteration from the last-seen message id.MAX_FLOOD_RETRIES=5consecutive flood-waits without progress before re-raising (the counter resets every time iteration yields a message, so a long backfill that hits one wait per chunk doesn't trip the cap), andMAX_FLOOD_WAIT_SECONDS=600clamp on the sleep so a buggy/malicious server can't hang the process for hours.src/telegram_backup.py(full chat backup and gap-fill).FLOOD_WAIT_LOG_THRESHOLD(default 10 s) controls the minimum wait to log, so trivial 1-2 s floods don't spam the log.Before/after is qualitative:
FloodWait: sleeping 47s before resuming (last_msg_id=87342, retry=1/5)— operator immediately knows what's happening.Conflict notes
Rebased on top of the recently-merged
SKIP_TOPIC_IDSfeature (#118). Both conflict sites insrc/telegram_backup.pyare trivial — our flood-retry wrapper wraps theiter_messagescall, and the topic-skip check stays inside the loop body where upstream added it.Test plan
pytest tests/test_flood_wait_visibility.py— 10/10 pass (3 original + 7 added in review)pytest tests/test_config.py— passesretry=N/MAXand resumes cleanly from the correct message id.