Skip to content

Conversation

@lawrence-forooghian
Copy link
Collaborator

@lawrence-forooghian lawrence-forooghian commented Dec 5, 2025

Non-sent queued messages were not being failed when the connection entered the SUSPENDED / FAILED / CLOSED states.

The existing logic relied on the queued messages having a msgSerial, but they don't get a msgSerial until the first time we send the message on the transport.

Resolves #2115.

Summary by CodeRabbit

  • New Features

    • Add option to complete either a specific range of messages or all pending messages in the queue.
  • Bug Fixes

    • Ensure idle state is emitted only after processing and when the queue becomes empty.
    • Improve completion/nack handling for messages depending on queued vs sent state.
  • Tests

    • Expanded tests covering connection suspends/failures/closures for both sent and queued messages.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

Walkthrough

Refactors MessageQueue.completeMessages to accept a selector ('all' | {serial,count}), enabling clearing all queued messages; updates Protocol call sites to pass the new selector object; moves idle emission after processing; and expands tests to cover queued-message nack scenarios and a private API identifier.

Changes

Cohort / File(s) Summary
MessageQueue API Refactoring
src/common/lib/transport/messagequeue.ts
Changed completeMessages signature to `selector: 'all'
Protocol Integration
src/common/lib/transport/protocol.ts
Updated call sites to pass a single object { serial, count } (or keep error as second arg) when reporting message completions to MessageQueue.
Test Infrastructure
test/common/modules/private_api_recorder.js
Added read.connectionManager.queuedMessages to the list of recognized private API identifiers.
Failure Test Coverage
test/realtime/failure.test.js
Renamed nack_on_connection_failurenack_of_sent_message_on_connection_failure; added nack_of_queued_message_on_connection_failure helper and three new tests for suspended/failed/closed connection states for queued messages; updated channel ids and instrumentation to detect queued message state and simulate connection state changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review src/common/lib/transport/messagequeue.ts for correct selector handling, edge cases when startSerial is undefined, proper splice indices, and that callbacks are always invoked and queue/idle state updated.
  • Check src/common/lib/transport/protocol.ts call-site updates for consistent object shape and error propagation.
  • Validate new/renamed tests in test/realtime/failure.test.js for correct instrumentation and that they accurately simulate queued vs sent message flows.
  • Verify private_api_recorder.js change aligns with test expectations.

Poem

🐇 I hopped through queues both big and small,
Found a selector that clears them all.
When connections falter and messages wait,
I nudge callbacks gently—no more fate.
Thump-thump, the queue is free; let's celebrate! 🎉

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: fixing the behavior of non-sent queued messages, which is the core issue addressed throughout the changeset.
Linked Issues check ✅ Passed The code changes directly address the issue: completeMessages now accepts a 'all' selector mode to clear all queued messages regardless of serial status, and new tests verify queued messages are properly failed on connection state changes.
Out of Scope Changes check ✅ Passed All changes directly support the objective of fixing non-sent queued message failures: the messagequeue API refactor, protocol updates, test framework additions, and comprehensive test cases for the queued message scenario are all in scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 2115-fail-pending-messages-on-connection-state

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot temporarily deployed to staging/pull/2123/bundle-report December 5, 2025 13:32 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2123/features December 5, 2025 13:32 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2123/typedoc December 5, 2025 13:32 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/realtime/failure.test.js (1)

624-636: Consider adding a timeout to prevent test hangs.

The while(true) polling loop has no escape hatch if the message is never queued. While unlikely in this controlled test, adding a timeout would make failures more diagnosable.

               (async () => {
                 helper.recordPrivateApi('call.Platform.nextTick');
                 helper.recordPrivateApi('read.connectionManager.queuedMessages');
+                const maxAttempts = 100;
+                let attempts = 0;
                 while (true) {
                   await new Promise((res) => Ably.Realtime.Platform.Config.nextTick(res));
                   if (realtime.connection.connectionManager.queuedMessages.count() > 0) {
                     failureFn(realtime, helper.withParameterisedTestTitle(null));
                     break;
                   }
+                  if (++attempts >= maxAttempts) {
+                    throw new Error('Timed out waiting for message to be queued');
+                  }
                 }
               })();
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ef368ba and a0be5cd.

📒 Files selected for processing (4)
  • src/common/lib/transport/messagequeue.ts (1 hunks)
  • src/common/lib/transport/protocol.ts (2 hunks)
  • test/common/modules/private_api_recorder.js (1 hunks)
  • test/realtime/failure.test.js (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/common/lib/transport/messagequeue.ts (1)
src/common/lib/transport/protocol.ts (1)
  • PendingMessage (11-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: test-browser (webkit)
  • GitHub Check: test-browser (chromium)
  • GitHub Check: test-browser (firefox)
  • GitHub Check: test-node (20.x)
  • GitHub Check: test-node (16.x)
  • GitHub Check: test-node (18.x)
🔇 Additional comments (6)
test/common/modules/private_api_recorder.js (1)

102-102: LGTM!

The new private API identifier is correctly added in alphabetical order and aligns with its usage in the new queued message failure tests.

src/common/lib/transport/protocol.ts (1)

46-62: LGTM!

The call sites are correctly updated to use the new object-based predicate format { serial, count } for the completeMessages API.

src/common/lib/transport/messagequeue.ts (2)

42-75: LGTM - Clean solution to the queued message failure issue.

The predicate-based approach elegantly handles both cases:

  • 'all': Clears all messages regardless of msgSerial assignment (fixing the original bug)
  • { serial, count }: Maintains existing ACK/NACK behavior for sent messages

84-86: LGTM!

Clean delegation to the unified completeMessages implementation.

test/realtime/failure.test.js (2)

486-540: LGTM - Clear rename and well-structured test helper.

The rename to nack_of_sent_message_on_connection_failure clearly distinguishes this scenario from the new queued message tests.


645-684: LGTM - Comprehensive test coverage for the fix.

The three new test cases properly cover SUSPENDED, FAILED, and CLOSED states for queued (not-yet-sent) messages, directly addressing the issue described in #2115.

@lawrence-forooghian lawrence-forooghian force-pushed the 2115-fail-pending-messages-on-connection-state branch from a0be5cd to 9740968 Compare December 5, 2025 13:39
Non-sent queued messages were not being failed when the connection
entered the SUSPENDED / FAILED / CLOSED states.

The existing logic relied on the queued messages having a msgSerial, but
they don't get a msgSerial until the first time we send the message on
the transport.

Resolves #2115.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
test/realtime/failure.test.js (2)

482-492: Clearer naming for sent-message helper; minor comment typo

Renaming to nack_of_sent_message_on_connection_failure and aligning the helper/channel IDs with that name makes the intent (message already sent on the transport) much clearer. There is a small typo in the nearby comment (onProtocolMesageonProtocolMessage) that you may want to fix for readability.


583-643: Queued-message helper hits the right semantics; consider bounding the polling loop

The new nack_of_queued_message_on_connection_failure helper correctly:

  • Forces the connection to remain non-CONNECTED by dropping inbound CONNECTED messages on transport.pending, and
  • Waits until the publish is actually queued before triggering the injected failure, then asserts the resulting error and connection state.

One thing to watch: the while (true) loop that polls queuedMessages.count() via Platform.Config.nextTick has no upper bound. If for any reason the message never appears in queuedMessages (e.g. future refactor of the queuing path), this test will just spin until the global Mocha timeout, which makes failures harder to diagnose. You might want to:

  • Add a max elapsed time or iteration count and fail the test explicitly if the condition is never met, or
  • Gate the loop with a shared flag so you can stop polling once the publish promise has settled.

Also, the same minor comment typo (onProtocolMesage) appears here as in the sent-message helper.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a0be5cd and 1bfd80d.

📒 Files selected for processing (4)
  • src/common/lib/transport/messagequeue.ts (1 hunks)
  • src/common/lib/transport/protocol.ts (2 hunks)
  • test/common/modules/private_api_recorder.js (1 hunks)
  • test/realtime/failure.test.js (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • test/common/modules/private_api_recorder.js
  • src/common/lib/transport/messagequeue.ts
  • src/common/lib/transport/protocol.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: test-browser (firefox)
  • GitHub Check: test-browser (webkit)
  • GitHub Check: test-browser (chromium)
  • GitHub Check: test-node (16.x)
  • GitHub Check: test-node (18.x)
  • GitHub Check: test-node (20.x)
🔇 Additional comments (2)
test/realtime/failure.test.js (2)

542-581: Sent-message RTN7c variants comprehensively cover SUSPENDED/FAILED/CLOSED paths

These three tests nicely exercise the “sent but unacked” publish behaviour across suspended (via becomeSuspended), failed (via injected ERROR protocol message with code 40100), and closed (via realtime.close()), and assert the expected error codes on the publish callback. The structure closely mirrors the helper’s contract and should guard the updated failure semantics well.


645-684: Queued-message RTN7c variants look consistent and target the right failure paths

The three queued-message tests (suspended/failed/closed) are well-aligned with the sent-message variants and RTN7c wording: they all drive the connection into the desired terminal state while the message is still queued, then assert both the connection state and the expected error codes (80002 for suspended, 40100 for failed with auth error, 80017 for closed). Using connectionManager.notifyState for suspended/failed keeps the tests tightly scoped to the behaviour under test.

Copy link
Member

@SimonWoolf SimonWoolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

MessageQueue behaviour on connection suspended state

3 participants