Description
Overview of the Issue
There seems to be a race condition that causes a deadlock in connection pooling that occurs when a large number of connections are borrowed/waiting, specifically when there are no new connections afterwards. Here is the general flow, assuming a connection pool of size 1 for example:
- "Thread" A borrows a connection from the pool
- Thread B attempts to borrow a connection from the pool.
- Some time after Thread B checks the pool but before it gets a chance to join the waitlist, Thread A completes and tries to pass its connection on to a waiter in the waitlist. As there are yet no waiters, it simply returns the connection to the pool
- Thread B now joins the waitlist, but all connections are free and there are no existing connections to pass the connection from. Thread B blocks forever waiting for a new connection, the context times out, and we see our error
code = ResourceExhausted desc = connection pool timed out
.
Normally, in a live production system, a new query would come in, and a connection would be pulled straight from the pool, rather than waiting on an existing connection to pass it on. The new connection could then pass it on to Thread B, breaking the deadlock. But when it comes to our (GitHub) CI, the nature of our queries tends to cause the race condition more often, as we fire a bunch of queries all at once as part of a UNION ALL
in our test cleanup code. These queries exceed the connection pool quickly, execute quickly, and cause the race condition. Since we're at the end of our test(s), no new queries are fired to pull a connection directly from the pool, and we wait forever.
Reproduction Steps
@arthurschreiber has come up with a test case that pretty consistently reproduces the error: #17661
Binary Version
main
Operating System and Environment details
all
Log Fragments
Trilogy::ProtocolError: 1203: target: github_test_repositories_actions_checks12.-80.primary: vttablet: rpc error: code = ResourceExhausted desc = connection pool timed out (CallerID: userData1) (trilogy_query_recv)