-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ipc) avoid new workers attempting to replay evicted events #97
Conversation
A follow-up to #88. Described in #93 and reported a few months ago is the likely possibility that mlcache and ipc instances are created in the `init` phase. If a worker is to be started much later in the master process' lifetime, the newly forked ipc instance will have an `idx` attribute set to the shm_idx value at the time of `init` (likely 0). These workers will resume polling evicted events, and `poll()` will likely timeout indefinitely from there. For the fix in #88 to work, the mlcache instance has to be instantiated during `init_worker` or later. This patch proposes an approach which works for instances created in both `init` and `init_worker`: as soon as the ipc shm has started evicting items, we guarantee that future workers will resume polling at the current index, without having to call any method but `poll()`. Fix #93
-- take note that eviction has started | ||
-- we repeat this flagging to avoid this key from ever being | ||
-- evicted itself | ||
local ok, err = self.dict:set(FORCIBLE_KEY, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better approach here would be if shm:incr() accepted flags, or maybe even an shm:set_flags() API. In the absence of a solution to rely on flags, we rely on an extraneous key, and on it never reaching the LRU tail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be enough to run this on the first call to poll
?
if self.idx < 0 then
self.idx = shm_idx
end
🤔
If we do this, we'd introduce a regression and we would prevent the first workers from polling any early broadcasted event, see for example this test. Since what we're worried about are missing (evicted) events which would cause streaks of timeouts, I should still be safe to preserve this behavior that if the event at index 1 is still present, then we can just replay them all fast enough. |
Does it still sound good to you @juancampa? Something you'd care testing maybe? I doubt it for such a minor change but just checking-in in case. |
What I hesitate a bit on is that the behavior is significantly different between workers that are spawned before the first eviction and workers that are started after an eviction. A simple example would be: a worker that is started, after 10000 events but before the eviction happened. Wouldn't that worker go through all 10000 events? Compare that to a worker that was started after 10000 events but after an eviction. This one wouldn't catch up at all. For mlcache purposes, do we need to replay old events at all. If a worker is new, it won't have the keys being deleted by the |
Well, like I said above, new/fresh workers need to run past events before the first If there isn't any other way, maybe we can change this behavior in a major release so that all ipcs first |
### Summary #### [2.5.0] > Released on: 2020/11/18 ##### Added - `get()` callback functions are now optional. Without a callback, `get()` now still performs on-cpu L1/L2 lookups (no yielding). This allows implementing new cache lookup patterns guaranteed to be on-cpu for a more constant, smoother latency tail end (e.g. values are refreshed in background timers with `set()`). Thanks Hamish Forbes and Corina Purcarea for proposing this feature and participating in its development! [#96](thibaultcha/lua-resty-mlcache#96) ##### Fixed - Improve `update()` robustness to worker crashes. Now, the library behind `cache:update()` is much more robust to re-spawned workers when initialized in the `init_by_lua` phase. [#97](thibaultcha/lua-resty-mlcache#97) - Document the `peek()` method `stale` argument which was not mentioned, as well as the possibility of negative TTL return values for expired items.
### Summary #### [2.5.0] > Released on: 2020/11/18 ##### Added - `get()` callback functions are now optional. Without a callback, `get()` now still performs on-cpu L1/L2 lookups (no yielding). This allows implementing new cache lookup patterns guaranteed to be on-cpu for a more constant, smoother latency tail end (e.g. values are refreshed in background timers with `set()`). Thanks Hamish Forbes and Corina Purcarea for proposing this feature and participating in its development! [#96](thibaultcha/lua-resty-mlcache#96) ##### Fixed - Improve `update()` robustness to worker crashes. Now, the library behind `cache:update()` is much more robust to re-spawned workers when initialized in the `init_by_lua` phase. [#97](thibaultcha/lua-resty-mlcache#97) - Document the `peek()` method `stale` argument which was not mentioned, as well as the possibility of negative TTL return values for expired items.
### Summary #### [2.5.0] > Released on: 2020/11/18 ##### Added - `get()` callback functions are now optional. Without a callback, `get()` now still performs on-cpu L1/L2 lookups (no yielding). This allows implementing new cache lookup patterns guaranteed to be on-cpu for a more constant, smoother latency tail end (e.g. values are refreshed in background timers with `set()`). Thanks Hamish Forbes and Corina Purcarea for proposing this feature and participating in its development! [#96](thibaultcha/lua-resty-mlcache#96) ##### Fixed - Improve `update()` robustness to worker crashes. Now, the library behind `cache:update()` is much more robust to re-spawned workers when initialized in the `init_by_lua` phase. [#97](thibaultcha/lua-resty-mlcache#97) - Document the `peek()` method `stale` argument which was not mentioned, as well as the possibility of negative TTL return values for expired items.
A follow-up to #88.
Described in #93 and reported a few months ago is the likely possibility that mlcache and ipc instances are created in the
init
phase. If a worker is to be started much later in the master process' lifetime, the newly forked ipc instance will have anidx
attribute set to the shm_idx value at the time ofinit
(likely 0). These workers will resume polling evicted events, andpoll()
will likely timeout indefinitely from there.For the fix in #88 to work, the mlcache instance has to be instantiated during
init_worker
or later.This patch proposes an approach which works for instances created in both
init
andinit_worker
: as soon as the ipc shm has started evicting items, we guarantee that future workers will resume polling at the current index, without having to call any method butpoll()
.Fix #93