Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ipc) avoid replaying all events when spawning new workers #88

Merged
merged 1 commit into from
Jan 17, 2020

Conversation

thibaultcha
Copy link
Owner

Prior to this fix, new workers spawned to replace existing, long running
ones would attempt to replay all events (from index 0 up to the current
index). This could cause new workers to never catch up with the current
pace of events being broadcast by older workers.

Thanks Robert Paprocki for the report.

Fix #87

Prior to this fix, new workers spawned to replace existing, long running
ones would attempt to replay all events (from index 0 up to the current
index). This could cause new workers to never catch up with the current
pace of events being broadcast by older workers.

Thanks Robert Paprocki for the report.

Fix #87
@thibaultcha
Copy link
Owner Author

@p0pr0ck5 Would you have a look at this fix? Thanks!

@p0pr0ck5
Copy link
Contributor

I think this will actually break things, because calling :get (https://github.com/openresty/lua-nginx-module#ngxshareddictget) is not supported inside of init_(worker)_by_lua, which is generally where .new() gets called

@p0pr0ck5
Copy link
Contributor

(Unless that's a horrendous oversight in the documentation, in which case, yeah, this is what i had in mind 👍 )

@thibaultcha
Copy link
Owner Author

Yes, that is a documentation oversight, the shm API is definitely available in both of the init and init_worker contexts (the init phase is even delayed by ngx_lua when at least 1 lua_shared_dict is defined for this exact reason).

@p0pr0ck5
Copy link
Contributor

Right, that's what I thought. Okay.

This looks solid at first glance. A bit busy currently but will have a close look and run through the lab environment where i reproduced the original behavior soon- hopefully late tonight or tomorrow.

@thibaultcha
Copy link
Owner Author

Awesome! Thanks!

@p0pr0ck5
Copy link
Contributor

Ran through my lab environment where I reported the bug and I no longer see the behavior noted. Nice fix! 👍

@thibaultcha
Copy link
Owner Author

Great, thanks for testing it!

@thibaultcha thibaultcha merged commit f3e07ab into master Jan 17, 2020
@thibaultcha thibaultcha deleted the fix/ipc-no-replay-all branch January 17, 2020 22:26
thibaultcha added a commit to Kong/kong that referenced this pull request Jan 20, 2020
Fixed:

- The IPC module now avoids replaying all events when spawning new workers, and
  gets initialized with the latest event index instead.
  [#88](thibaultcha/lua-resty-mlcache#88)

Changelog:

    https://github.com/thibaultcha/lua-resty-mlcache/blob/master/CHANGELOG.md#241
hishamhm pushed a commit to Kong/kong that referenced this pull request Jan 20, 2020
Fixed:

- The IPC module now avoids replaying all events when spawning new workers, and
  gets initialized with the latest event index instead.
  [#88](thibaultcha/lua-resty-mlcache#88)

Changelog:

    https://github.com/thibaultcha/lua-resty-mlcache/blob/master/CHANGELOG.md#241
thibaultcha added a commit that referenced this pull request Sep 25, 2020
A follow-up to #88.

Described in #93 and reported a few months ago is the likely possibility
that mlcache and ipc instances are created in the `init` phase. If a
worker is to be started much later in the master process' lifetime, the
newly forked ipc instance will have an `idx` attribute set to the
shm_idx value at the time of `init` (likely 0). These workers will
resume polling evicted events, and `poll()` will likely timeout
indefinitely from there.

For the fix in #88 to work, the mlcache instance has to be instantiated
during `init_worker` or later.

This patch proposes an approach which works for instances created in
both `init` and `init_worker`: as soon as the ipc shm has started
evicting items, we guarantee that future workers will resume polling at
the current index, without having to call any method but `poll()`.

Fix #93
thibaultcha added a commit that referenced this pull request Nov 18, 2020
A follow-up to #88.

Described in #93 and reported a few months ago is the likely possibility
that mlcache and ipc instances are created in the `init` phase. If a
worker is to be started much later in the master process' lifetime, the
newly forked ipc instance will have an `idx` attribute set to the
shm_idx value at the time of `init` (likely 0). These workers will
resume polling evicted events, and `poll()` will likely timeout
indefinitely from there.

For the fix in #88 to work, the mlcache instance has to be instantiated
during `init_worker` or later.

This patch proposes an approach which works for instances created in
both `init` and `init_worker`: as soon as the ipc shm has started
evicting items, we guarantee that future workers will resume polling at
the current index, without having to call any method but `poll()`.

Fix #93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Workers launched after IPC shm LRU fail to catch up
2 participants