Skip to content

Fix: resolve doubly linked list push issue#3085

Merged
nkaradzhov merged 7 commits intoredis:masterfrom
ntvviktor:fix-doubly-linked-list-push
Oct 2, 2025
Merged

Fix: resolve doubly linked list push issue#3085
nkaradzhov merged 7 commits intoredis:masterfrom
ntvviktor:fix-doubly-linked-list-push

Conversation

@ntvviktor
Copy link
Contributor

Description

Describe your pull request here

Now the client pool properly shut down after getting a bad connection.

Checklist

  • Does npm test pass with this change (including linting)?
  • Is the new or changed code fully tested?
  • Is a documentation update included (if this change modifies existing APIs, or introduces new ones)? Just a Bug fix.

@nkaradzhov nkaradzhov self-requested a review September 23, 2025 07:41
@nkaradzhov
Copy link
Collaborator

@ntvviktor looks like you broke the tests

@ntvviktor
Copy link
Contributor Author

ntvviktor commented Sep 24, 2025

@ntvviktor looks like you broke the tests

hey nkaradzhov, I am considering, even though fixing the code logic of the DoublyLinkedList will properly shut down the bad host request for redis connection, but it will break the sentinel client tests. I believe it has been somehow coded to pass with the error of issue:

TypeError: Cannot set properties of undefined (setting 'next')
  File "/app/node_modules/.pnpm/@redis+client@5.8.0/node_modules/@redis/client/dist/lib/client/linked-list.js", line 76, col 32, in DoublyLinkedList.remove
    node.previous.next = node.next;

I will investigate further to try to triage and fix the root cause.

@ntvviktor
Copy link
Contributor Author

ntvviktor commented Sep 25, 2025

In the case of the issue above, seems like the RedisClientPool use the Doubly Linked List for a client in use store, so when there is an error such as bad host request as the issue described, the bang mark node.previous!.next in the remove() function in DoublyLinkedList lead to the TypeError since it calls remove() twice, in the catch, and in the #returnClient.

async #create() {
    const node = this._self.#clientsInUse.push(
      this._self.#clientFactory()
        .on('error', (err: Error) => this.emit('error', err))
    );

    try {
      const client = node.value;
      await client.connect();
    } catch (err) {
      this._self.#clientsInUse.remove(node);
      throw err;
    }

    this._self.#returnClient(node);
  }

I fix using the if statement to check for the previous is not enough, and found out a root cause from RedisCommandQueue also using Doubly Linked List.
It uses a for loop statement over an iterator *nodes() function while using remove() in the loop, leading to a TimeOut command execution in the Test, so in this PR I also trying to fix that by changing the iterator function *nodes() to make sure setting node.next to undefined at shift(), unshift(), and remove() will be safe.

I ran all tests locally and it currently works as expected. @nkaradzhov Please help me enable the pipeline to see whether my fix break anything. Thanks!

@ntvviktor
Copy link
Contributor Author

Hi @nkaradzhov, please help me review the code, thanks in advance

@nkaradzhov
Copy link
Collaborator

@ntvviktor the PR looks to be damaged somehow, it looks like ther are some unwanted changes after a merge?

@ntvviktor ntvviktor force-pushed the fix-doubly-linked-list-push branch from c03c34f to 234f1b6 Compare October 1, 2025 01:26
@ntvviktor
Copy link
Contributor Author

@ntvviktor the PR looks to be damaged somehow, it looks like ther are some unwanted changes after a merge?

Hi @nkaradzhov thanks for feedback, I have made changes, my bad on the rebase master branch into this.

@nkaradzhov
Copy link
Collaborator

@ntvviktor sorry for being paranoid here, but could you please provide a test case for the DoublyLinkedList somewhere here
Usually, when we find a bug, we write a test that proves the bug, then do a fix and the test starts passing. We end up having our test suite growing stronger

@ntvviktor
Copy link
Contributor Author

@ntvviktor sorry for being paranoid here, but could you please provide a test case for the DoublyLinkedList somewhere here Usually, when we find a bug, we write a test that proves the bug, then do a fix and the test starts passing. We end up having our test suite growing stronger

Thanks @nkaradzhov, I have added the according tests.

Copy link
Collaborator

@nkaradzhov nkaradzhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLGTM

@nkaradzhov nkaradzhov merged commit 73413e0 into redis:master Oct 2, 2025
17 checks passed
@nkaradzhov
Copy link
Collaborator

Looks like this causes #3127

@nkaradzhov
Copy link
Collaborator

@ntvviktor, sorry, what was the reason for the addition of #handleSentinelFailure?

@ntvviktor
Copy link
Contributor Author

ntvviktor commented Nov 6, 2025

Hey @nkaradzhov, it was because during this test case, after a master is shutdown and the test trying to reconnect, the sentinelSentinels or some other utils could use the first item [0] in the list, but it already shutdown, hence the topology-change was never reached or emitted, only the master-change was? hence, I made changes by remove the dead node before the #reset function so that it could has the correct event emitted, I personally think the #reset function would bring back the re-connecting for all of the redis sentinels, but perhaps I made a wrong assumption. Let me know what you think

it('shutdown sentinel node', async function () {
      this.timeout(60000);
      sentinel = frame.getSentinelClient();
      sentinel.setTracer(tracer);
      sentinel.on("error", () => { });
      await sentinel.connect();
      tracer.push("connected");

      let sentinelChangeResolve;
      const sentinelChangePromise = new Promise((res) => {
        sentinelChangeResolve = res;
      })

      const sentinelNode = sentinel.getSentinelNode();
      tracer.push(`sentinelNode = ${sentinelNode?.port}`)

      sentinel.on('topology-change', (event: RedisSentinelEvent) => {
        tracer.push(`got topology-change event: ${JSON.stringify(event)}`);
        if (event.type === "SENTINEL_CHANGE") {
          tracer.push("got sentinel change event");
          sentinelChangeResolve(event.node);
        }
      });

      tracer.push("Stopping sentinel node");
      await frame.stopSentinel(sentinelNode!.port.toString());
      tracer.push("Stopped sentinel node and waiting on sentinel change promise");
      const newSentinel = await sentinelChangePromise as RedisNode;
      tracer.push("got sentinel change promise");
      assert.notEqual(sentinelNode!.port, newSentinel.port);
    });

when the frame trying to get

This was referenced Feb 8, 2026
mayankkapoor pushed a commit to mayankkapoor/node-redis that referenced this pull request Feb 24, 2026
…ent fails to reconnect, throwing `"None of the sentinels are available"`. This is a regression introduced in v5.9.0 (commit `73413e086c` — "fix: resolve doubly linked list push issue (redis#3085)").

**Root Cause:** `#handleSentinelFailure()` (added in v5.9.0) permanently removes the sentinel node from `#sentinelRootNodes` via `splice()` when its connection drops. When `#reset()` then calls `observe()`, the sentinel list is empty (or missing that node), so reconnection is impossible.

**Secondary issue:** The sentinel list update in `transform()` (line 1350) compares only array lengths (`analyzed.sentinelList.length != this.#sentinelRootNodes.length`), so even if reconnection succeeds via another sentinel, a removed node may never be restored if the list sizes happen to match.

- [ ] **Task 1:** Fix `#handleSentinelFailure` — stop permanently removing sentinel nodes
  - **File:** `packages/client/lib/sentinel/index.ts` (lines 934–942)
  - Remove the `splice()` call. A transient TCP disconnect (which happens during sentinel restart) should not permanently delete the node from the root list. Just call `#reset()` to trigger reconnection, matching v5.8.3 behavior.
  - Before (broken):
    ```typescript
    #handleSentinelFailure(node: RedisNode) {
        const found = this.#sentinelRootNodes.findIndex(
            (rootNode) => rootNode.host === node.host && rootNode.port === node.port
        );
        if (found !== -1) {
            this.#sentinelRootNodes.splice(found, 1);
        }
        this.#reset();
    }
    ```
  - After (fixed):
    ```typescript
    #handleSentinelFailure(node: RedisNode) {
        this.#reset();
    }
    ```

- [ ] **Task 2:** Fix sentinel list update condition in `transform()`
  - **File:** `packages/client/lib/sentinel/index.ts` (line 1350)
  - Replace the length-only comparison with a proper content comparison so that nodes removed or added are always detected.
  - Before:
    ```typescript
    if (analyzed.sentinelList.length != this.#sentinelRootNodes.length) {
    ```
  - After — compare actual content:
    ```typescript
    if (!areSentinelListsEqual(analyzed.sentinelList, this.#sentinelRootNodes)) {
    ```
  - Add a helper (either inline or as a small private method) that compares host+port of each node, not just array length.

- [ ] **Task 3:** Run existing tests to confirm no regressions
  - Run `npm test` in the `packages/client` directory
  - Look for and run any sentinel-specific test files

- [ ] **Task 4:** Verify the fix scenario
  - Confirm `#sentinelRootNodes` retains the original sentinel entry after disconnect
  - Confirm `observe()` retries that sentinel and succeeds once it's back online
  - Confirm no `"None of the sentinels are available"` error after sentinel restart
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants