Skip to content

Idle Connections are not being cleaned up under certain conditions. #3020

Closed
@robert-pitt-foodhub

Description

@robert-pitt-foodhub

Setup

  • Environment: AWS Lambda (x86)
  • mysql2 version: v3.110 (latest as of post)
  • MySQL Versions Tested: 8 & 9

Description:

Hey, I have been noticing some unusual connection errors, and have been tracing it back to the connection cleanup process within the Pool class, below I have outlined my setup and the steps I took to find the issue and how to replicate the problem using an existing test case.

We are getting a fairly sizeable percentage of requests that are failing due to socket timeout or disconnection issues, which typically show as the PROTOCOL_CONNECTION_LOST, as shown below:

Error: Connection lost: The server closed the connection.
    at PromisePoolConnection.query (/opt/nodejs/node_modules/mysql2/promise.js:93:22)
    at /var/task/index.js:1435:25
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async Runtime.handler (/var/task/index.js:2659:15) {
  code: 'PROTOCOL_CONNECTION_LOST',
  errno: undefined,
  sql: undefined,
  sqlState: undefined,
  sqlMessage: undefined
}

The configuration we had setup was {"maxIdle": 3, connectionLimit: 3, idleTimeout: 30000 }, and after digging around I found the following line of code, which implies that the maxIdle MUST be less than the connectionLimit, so in this configuration the connection cleanup process is not enabled, meaning once a connection is opened, it will never be removed.

if (this.config.maxIdle < this.config.connectionLimit) {

So I then with excitement updated one of our production lambdas to take on the config {"maxIdle": 1, connectionLimit: 3, idleTimeout: 30000 } in the hope that activating the connection cleanup process would resolve the connection errors we were seeing, I went to bed with the hope of waking up to no red blobs on our dashboard, however to my surprise the connection errors continued to happen, following a similar patterns to the previous nights, as if it made no effect.

I looked at some of the logs and I noticed that the Timer object / reference was on the pool properties since I dropped the maxIdle to a lower number than the connectionLimit

Before:
image

After:
image

So I was certain the timer was now running, however I was still having the same connection errors 😠 ..., I added some more logs to track how long each connections thread id was lasting and what the time delays were between queries

So at this point I knew there was a possibility the the bug is within the node-mysql2 library, so I cloned the repository and setup a dockerised mysql sandbox to be able to execute the integration test suite, which thankfully was a piece of cake to get up and running, I ran the full suite to validate that everything is expected, and then I took at a look the test/integration/test-pool-release-idle-connection.test.cjs test file as that was closes to what I was doing.

https://github.com/sidorares/node-mysql2/blob/dee0c0813854658baf5f73055e5de36a1a78e7bf/test/integration/test-pool-release-idle-connection.test.cjs

I setup my MYSQL, ensured that there were no connections from anything else and executed the test file as is, without making any changes

MySQL before the tests:

Screenshot 2024-09-06 at 21 22 44

Executed the tests via the following command:

yarn poku ./test/integration/test-pool-release-idle-connection.test.cjs

The test PASSES as expected 🎉

Screenshot 2024-09-06 at 21 24 49

Within the MYSQL console, I was refreshing the process list every second and I can see 3 connections created, as expected, and I see those same three connections drop off after around 7-8 seconds, which again Is what I expect given the setTimeout within this test scenario.

After the test was completed, mysql was back down to the initial two base connections:
Screenshot 2024-09-06 at 21 26 15


Ok, here's where it seems to go wrong, which I think could be the cause for connection socket timeouts that I am experiencing, when I change the configuration to {connectionLimit: 3, maxIdle: 2}, the test just hangs, it never finishes (even after 10/15+ minutes).

Screenshot 2024-09-06 at 21 33 35

When checking MySQL it seems as though the there are connections outstanding that have not been closed

Screenshot 2024-09-06 at 21 32 18

Im pretty certain that the cause of this bug is somewhere near this code, however I don't have a huge amount of experience with this library so looking for some support here on replicating the bug to offering a solution that wont cause any regressions.

Thanks

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions