Skip to content

Conversation

DifferentialOrange
Copy link
Member

@DifferentialOrange DifferentialOrange commented Aug 29, 2022

api: forbid putting to a wrong pool

Connection pools allow user to get connections and explicitly return them back. But it is allowed to return connection to any pool. This could easily lead to pool queue overflow. Each pool allocates fiber channel of connection count size to store connections. If user would put several extra connections to a pool, next puts will lead to pool:put() hangup since channel will be full and channel:put() is used without timeout and possible error processing, relying on get/put+gc balance.

Part of #67

gc: prevent potential fiber yield

Yielding in gc is prohibited since Tarantool 2.6.0-138-gd3f1dd720, 2.5.1-105-gc690b3337, 2.4.2-89-g83037df15, 1.10.7-47-g8099cb053. If fiber yield happens in object gc, Tarantool process exits with code 1.

Gc hook of a connection from a connection pool calls pool.queue:put(...). Fiber channel:put() may yield if the channel
is full. channel:put()/channel:get() in mysql driver connection pool is balanced:

  • the channel capacity is equal to the connection count;
  • pool create fills the channel with multiple channel:put()s all the way through;
  • pool:get() causes single channel:get() and creates a connection object with gc hook;
  • pool:put() causes single channel:put() and removes gc hook;
  • gc hook causes single channel:put().

There are no other cases of channel:put(). pool:put() and gc hook cannot be executed both in the same time: either a connection is used inside pool:put() and won't be garbage collected or a connection is no longer used anywhere else (including pool:put()) and it will be garbage collected. So now there are no valid cases when gc hook may yield unless someone messes up with pool queue. Messing up with pool queue anyway breaks a connection pool internal logic, so we make the pool unavailable in that case.

Delayed gc should not break existing logic. If pool:get() is called, we expect to get a connection after gc and it's not yet returned, channel:get() will yield and trigger all planned returns.

Closes #67

@DifferentialOrange DifferentialOrange changed the title Yield in connection from pool GC Potential yield in connection from pool GC Aug 29, 2022
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-67-gc-yield branch 4 times, most recently from af2a642 to c4087bc Compare August 29, 2022 14:11
Copy link
Contributor

@LeonidVas LeonidVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thank you for the patchset.

bug: prevent potential fiber yield in gc

bug is not a subsystem and looks strange. I can prepare something like: gc: prevent fiber yield
LGTM.

@Totktonada
Copy link
Member

Messing up with pool queue anyway breaks a connection pool internal logic, but at least it won't be causing a process exit after this patch.

Why do you think that it is good? An incorrect behavior is worse than a fair error. Sure, stop everything is not nice. We can mark the pool as broken and decline all operations with it. But we should give a fair error. Instead, you hide it.

I don't see any changes, which would help to find a root of the problem if it'll actually occur.

@LeonidVas LeonidVas requested a review from Totktonada August 30, 2022 20:17
@DifferentialOrange
Copy link
Member Author

DifferentialOrange commented Aug 31, 2022

Messing up with pool queue anyway breaks a connection pool internal logic, but at least it won't be causing a process exit after this patch.

Why do you think that it is good? An incorrect behavior is worse than a fair error. Sure, stop everything is not nice. We can mark the pool as broken and decline all operations with it. But we should give a fair error. Instead, you hide it.

I don't see any changes, which would help to find a root of the problem if it'll actually occur.

It's not a fair error, it's plain process exit.

Using fiber.yield in gc is forbidden either way. This patch removes the usage of a forbidden thing. Regarding processing possible errors in gc... is there any valid way supported by Tarantool? As far as I remember, throwing errors is forbidden same as yielding.

@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-67-gc-yield branch from c4087bc to 760174e Compare August 31, 2022 10:49
@DifferentialOrange
Copy link
Member Author

bug is not a subsystem and looks strange. I can prepare something like: gc: prevent fiber yield LGTM.

Changed

@Totktonada
Copy link
Member

is there any valid way supported by Tarantool?

Log as much as possible (error message, connection address/identifier, pool address/identifier, Lua backtrace), block the entire pool for future operations.

Connection pools allow user to get connections and explicitly return
them back. But it is allowed to return connection to any pool. This
could easily lead to pool queue overflow. Each pool allocates fiber
channel of connection count size to store connections. If user would put
several extra connections to a pool, next puts will lead to `pool:put()`
hangup since channel will be full and `channel:put()` is used without
timeout and possible error processing, relying on get/put+gc balance.

Part of #67
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-67-gc-yield branch 3 times, most recently from 3809280 to 76eaca1 Compare September 5, 2022 12:26
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-67-gc-yield branch 3 times, most recently from 9c8eca2 to aebe7a0 Compare September 5, 2022 12:45
@DifferentialOrange
Copy link
Member Author

@Totktonada I think we may test only that pool became unavailable and do not bother with reading logs.

@Totktonada
Copy link
Member

I think we may test only that pool became unavailable and do not bother with reading logs.

I don't mind.

@DifferentialOrange
Copy link
Member Author

DifferentialOrange commented Sep 6, 2022

@LeonidVas , @Totktonada , I've updated the PR after our discussion. Please, update your reviews

'pool.queue manually. Closing the pool...')
log.error(debug.traceback)

pool:close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know, TBH, whether we should try to close all the connections or just mark the pool as unusable and don't touch anything inside. How you decided to prefer pool:close() over pool.usable = false?

(I really don't know what is better, it is not some indirect push toward one of variants.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How you decided to prefer pool:close() over pool.usable = false?

Well, the main cause was that the first thing pool:close() do is setting pool.usable = false, so even if anything is wrong with pool contents, pool still will be unusable. The connections in pool are the connections not used by anyone else (if user's got a connection with pool:get(), it won't be in queue), so I think it's better to close them explicitly if possible rather than waiting for garbage collect.

mysql/init.lua Outdated
Comment on lines 36 to 38
log.error('pool %s internal queue unexpected state: there are no ' ..
'empty slots. It is likely that someone had messed with ' ..
'pool.queue manually. Closing the pool...')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think there should be word 'mysql' somewhere.
  2. Can we add something about the connection object or raw connection?

Copy link
Member Author

@DifferentialOrange DifferentialOrange Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

log.error('mysql pool %s internal queue unexpected state: there are no ' ..
          'empty slots, connection %s cannot be put back. It is likely ' ..
          'that someone had messed with pool.queue manually. Closing ' ..
          'the pool...', pool, conn_id)

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a brief glance the changes looks meaningful.

The code is tricky and I can't say that I'm able to spot a problem here even if there is one, so that's all that I can say.

@Totktonada
Copy link
Member

BTW, nice description!

Yielding in gc is prohibited since Tarantool 2.6.0-138-gd3f1dd720,
2.5.1-105-gc690b3337, 2.4.2-89-g83037df15, 1.10.7-47-g8099cb053.
If fiber yield happens in object gc, Tarantool process exits with
code 1.

Gc hook of a connection from a connection pool calls
`pool.queue:put(...)`. Fiber `channel:put()` may yield if the channel
is full. `channel:put()`/`channel:get()` in mysql driver connection pool
is balanced:
* the channel capacity is equal to the connection count;
* pool create fills the channel with multiple `channel:put()`s
  all the way through;
* `pool:get()` causes single `channel:get()` and creates a connection
   object with gc hook;
* `pool:put()` causes single `channel:put()` and removes gc hook;
* gc hook causes single `channel:put()`.

There are no other cases of `channel:put()`. `pool:put()` and gc hook
cannot be executed both in the same time: either a connection is used
inside `pool:put()` and won't be garbage collected or a connection is no
longer used anywhere else (including `pool:put()`) and it will be
garbage collected. So now there are no valid cases when gc hook may
yield unless someone messes up with pool queue. Messing up with pool
queue anyway breaks a connection pool internal logic, so we make the
pool unavailable in that case.

Delayed gc should not break existing logic. If `pool:get()` is called,
we expect to get a connection after gc and it's not yet returned,
`channel:get()` will yield and trigger all planned returns.

Closes #67
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-67-gc-yield branch from dbb50d5 to 353a738 Compare September 7, 2022 07:32
@DifferentialOrange DifferentialOrange merged commit a514081 into master Sep 9, 2022
@DifferentialOrange DifferentialOrange deleted the DifferentialOrange/gh-67-gc-yield branch September 9, 2022 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Seems that gc handler of connection can yield
3 participants