Description
Hi, we've just encountered a problem which seems to predictably occur when a Client instance disconnects. It results in promises returned from Client#query
never settling as fulfilled nor rejected (when using callbacks, those are not fired, either).
The below gist is a trivial example (with output) to reproduce the behavior. If you cut off the connection to PostgreSQL at some point (in my case, it's a Docker container with postgres:9.6
being stopped), the next .query()
call rejects with an error, but the next call after that never settles its promise, as illustrated by the output.
https://gist.github.com/rkaw92/9cbd71fdb98d74f01c624bb2a003cd2a
You'll notice that subsequent queries are never executed, and they never reject, either - they just hang.
The issue has several important consequences:
- It breaks libraries and applications that rely on the Promise always being resolved or rejected, such as knex.js, causing eventual client pool exhaustion
- Indeed, if you do suffer from a disconnect with knex,js and your connection pool is exhausted (because the .query() hangs the second time, never releasing the client), then restoring connectivity has no effect - the pool is forever exhausted. Note, however, that knex.js uses
generic-pool
rather than pg's default pool, and manages acquiring/releasing clients on its own. (This is another issue that I am currently pursuing - knex's connection recycling code is buggy at the moment.)
- Indeed, if you do suffer from a disconnect with knex,js and your connection pool is exhausted (because the .query() hangs the second time, never releasing the client), then restoring connectivity has no effect - the pool is forever exhausted. Note, however, that knex.js uses
- It leaves unresolved Promises, which are never a good thing and may trigger Node monitoring tools (and rightly so).
- It most likely keeps adding queries to the internal Client queue indefinitely, leaking memory
I think I've pinned down this behavior to the following sequence of events (hold on, it's going to be a bumpy ride over client.js):
- Assume the Client instance is connected and the connection has signalled readiness - the handler in line 141 has fired. Any connection error events will now be handled using
connectedErrorHandler
. - The connection goes down, thus emitting
error
(andend
shortly after) on thecon
object. connectedErrorHandler
fires. In our example, we will assume that no query is currently active (if there is one, the situation converges to this case later, anyway), so the handler has no effect other than re-emitting theerror
event on the Client object.- The user sends a query using
Client#query
. The Query object is pushed to the internal query queue and we enter_pulseQueryQueue
. - Since
this.readyForQuery === true
, the passed query object becomesthis.activeQuery
and is taken off the queue. this.readyForQuery
is set to false (so that subsequentquery()
calls will queue up, rather than be submitted immediately).- The query is submitted to the connection. The connection emits an error because it's closed!
- We go back to
connectedErrorHandler
, but this time, there is an active connection - error handling is delegated to activeQuery.handleError. This is how we get our query callback to fire or our promise to rejected. - It's all good for now, right? Not exactly, because even though our failed query has left the queue,
this.readyForQuery
has not been reset. - The user sends another query with
Client#query
. The query is again added to the queue (it now contains 1 element - the new query), but this time, in_pulseQueryQueue
, we find thatreadyForQuery === false
, so we do not process the queue. We exit the function, doing nothing. - We're now in a limbo - no new queries are going to get processed because we're not
readyForQuery
, and we never will be (since the connection is dead). Additionally, new queries are getting pushed into the queue at every call.
Trying to think of a solution right now. Perhaps having a "dead" flag that rejects all subsequent query
calls would do. One other thing I can think of would be resetting readyForQuery
every time we get a connection error, but it does not sound like a wise thing to do.
In any case, unless callbacks/promises returned from Client#query()
fire every time eventually, there is a demonstrable hazard of getting hung calls in user apps and exhausted client pools in knex.js.