feat(ipc) avoid new workers attempting to replay evicted events #97

thibaultcha · 2020-09-25T07:42:28Z

A follow-up to #88.

Described in #93 and reported a few months ago is the likely possibility that mlcache and ipc instances are created in the init phase. If a worker is to be started much later in the master process' lifetime, the newly forked ipc instance will have an idx attribute set to the shm_idx value at the time of init (likely 0). These workers will resume polling evicted events, and poll() will likely timeout indefinitely from there.

For the fix in #88 to work, the mlcache instance has to be instantiated during init_worker or later.

This patch proposes an approach which works for instances created in both init and init_worker: as soon as the ipc shm has started evicting items, we guarantee that future workers will resume polling at the current index, without having to call any method but poll().

Fix #93

A follow-up to #88. Described in #93 and reported a few months ago is the likely possibility that mlcache and ipc instances are created in the `init` phase. If a worker is to be started much later in the master process' lifetime, the newly forked ipc instance will have an `idx` attribute set to the shm_idx value at the time of `init` (likely 0). These workers will resume polling evicted events, and `poll()` will likely timeout indefinitely from there. For the fix in #88 to work, the mlcache instance has to be instantiated during `init_worker` or later. This patch proposes an approach which works for instances created in both `init` and `init_worker`: as soon as the ipc shm has started evicting items, we guarantee that future workers will resume polling at the current index, without having to call any method but `poll()`. Fix #93

thibaultcha · 2020-09-25T08:10:43Z

lib/resty/mlcache/ipc.lua

+        -- take note that eviction has started
+        -- we repeat this flagging to avoid this key from ever being
+        -- evicted itself
+        local ok, err = self.dict:set(FORCIBLE_KEY, true)


A better approach here would be if shm:incr() accepted flags, or maybe even an shm:set_flags() API. In the absence of a solution to rely on flags, we rely on an extraneous key, and on it never reaching the LRU tail.

juancampa

Wouldn't it be enough to run this on the first call to poll?

if self.idx < 0 then
  self.idx = shm_idx
end

🤔

thibaultcha · 2020-09-25T19:30:18Z

If we do this, we'd introduce a regression and we would prevent the first workers from polling any early broadcasted event, see for example this test. Since what we're worried about are missing (evicted) events which would cause streaks of timeouts, I should still be safe to preserve this behavior that if the event at index 1 is still present, then we can just replay them all fast enough.

thibaultcha · 2020-09-28T20:30:39Z

Does it still sound good to you @juancampa? Something you'd care testing maybe? I doubt it for such a minor change but just checking-in in case.

juancampa · 2020-09-30T19:49:18Z

What I hesitate a bit on is that the behavior is significantly different between workers that are spawned before the first eviction and workers that are started after an eviction. A simple example would be:

a worker that is started, after 10000 events but before the eviction happened. Wouldn't that worker go through all 10000 events? Compare that to a worker that was started after 10000 events but after an eviction. This one wouldn't catch up at all.

For mlcache purposes, do we need to replay old events at all. If a worker is new, it won't have the keys being deleted by the broadcast in L1 anyway.

thibaultcha · 2020-10-01T01:12:20Z

Well, like I said above, new/fresh workers need to run past events before the first poll() in the current implementation, and I did not want to break this atm. Keeping track of which workers are new or which ones are re-spawning could be done but I think this is already a worthy improvement without too complicated of a patch. Better have a worker replay past events even if it may timeout a few times along the way as it catches up, rather than never catch up at all.

If there isn't any other way, maybe we can change this behavior in a major release so that all ipcs first poll() always skip to the current index, for mlcache purposes only.

### Summary #### [2.5.0] > Released on: 2020/11/18 ##### Added - `get()` callback functions are now optional. Without a callback, `get()` now still performs on-cpu L1/L2 lookups (no yielding). This allows implementing new cache lookup patterns guaranteed to be on-cpu for a more constant, smoother latency tail end (e.g. values are refreshed in background timers with `set()`). Thanks Hamish Forbes and Corina Purcarea for proposing this feature and participating in its development! [#96](thibaultcha/lua-resty-mlcache#96) ##### Fixed - Improve `update()` robustness to worker crashes. Now, the library behind `cache:update()` is much more robust to re-spawned workers when initialized in the `init_by_lua` phase. [#97](thibaultcha/lua-resty-mlcache#97) - Document the `peek()` method `stale` argument which was not mentioned, as well as the possibility of negative TTL return values for expired items.

This was referenced Sep 25, 2020

Support only returning cached values #96

Closed

[enhancement] strengthen the IPC module #93

Closed

thibaultcha commented Sep 25, 2020

View reviewed changes

juancampa reviewed Sep 25, 2020

View reviewed changes

thibaultcha merged commit 1a5bd4a into master Nov 18, 2020

thibaultcha deleted the feat/new-workers-resume-polling branch November 18, 2020 23:27

bungle mentioned this pull request Dec 7, 2020

chore(deps) bump resty.mlcache from 2.4.1 to 2.5.0 Kong/kong#6635

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ipc) avoid new workers attempting to replay evicted events #97

feat(ipc) avoid new workers attempting to replay evicted events #97

thibaultcha commented Sep 25, 2020 •

edited

Loading

thibaultcha Sep 25, 2020

juancampa left a comment

thibaultcha commented Sep 25, 2020

thibaultcha commented Sep 28, 2020

juancampa commented Sep 30, 2020

thibaultcha commented Oct 1, 2020 •

edited

Loading

feat(ipc) avoid new workers attempting to replay evicted events #97

feat(ipc) avoid new workers attempting to replay evicted events #97

Conversation

thibaultcha commented Sep 25, 2020 • edited Loading

thibaultcha Sep 25, 2020

Choose a reason for hiding this comment

juancampa left a comment

Choose a reason for hiding this comment

thibaultcha commented Sep 25, 2020

thibaultcha commented Sep 28, 2020

juancampa commented Sep 30, 2020

thibaultcha commented Oct 1, 2020 • edited Loading

thibaultcha commented Sep 25, 2020 •

edited

Loading

thibaultcha commented Oct 1, 2020 •

edited

Loading