-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: resurrect stale values on soft callback errors #52
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 tasks
openresty/lua-resty-lrucache#35 will allow us to store a stale flag in the LRU level, and thus efficiently provide a consistent |
thibaultcha
force-pushed
the
feat/resurrect-stale
branch
from
May 21, 2018 18:22
739467e
to
dec424a
Compare
This comment has been minimized.
This comment has been minimized.
thibaultcha
force-pushed
the
feat/resurrect-stale
branch
4 times, most recently
from
June 29, 2018 00:33
65a129c
to
01ba82a
Compare
thibaultcha
force-pushed
the
feat/resurrect-stale
branch
3 times, most recently
from
June 29, 2018 00:43
0936275
to
a23e8cf
Compare
thibaultcha
added a commit
that referenced
this pull request
Jun 29, 2018
Implement a new `opts.resurrect_ttl` instance and `get()` option. When specified, the `get()` method behaves in a more resilient fashion. Context ------- We currently consider callback-returned errors (`nil, err`) to be the cause of I/O failures in the callback (i.e. failure to fetch/refresh some value due to timeout, connection refused, etc...). When the callback returns as such, we currently return `nil, err` from `get()` as well. Instead, we could serve the stale data if is it still available (and desired by the user) Changes ------- No changes in behavior when `opts.resurrect_ttl` is not specified (backwards-compatible). If it is specified, and a callback lookup is triggered and returns such a soft error (`nil, err`), we resurrect the stale value (if it is still available) for `resurrect_ttl` seconds. When `resurrect_ttl` is reached, subsequent `get()` calls will trigger the callback again, and if the callback happens to fail again, the stale value (if it is still available) will be resurrected for `resurrect_ttl` again. The cycle continues for as long as the value isn't LRU evicted during its stale-but-non-resurrected lifetime, or when the callback succeeds in producing a fresh value. To ensure backwards compatibility, callback soft errors produced when the `opts.resurrect_ttl` option is used **do not** return an error (`data, err`). Instead, the error is logged with the WARN level (similarly to shm retries errors), and we return `data, nil`. Current users of this module can therefore simply add the `resurrect_ttl` option without modifying the way return values are currently handled. When stale data is resurrected or subsequently returned by `get()`, the `hit_lvl` return value will be `4`. Lock timeouts occurring when `resurrect_ttl` is used will also log the error at the WARN level and return a `hit_lvl` of `4` (if the value is still available). However, the stale value will not be resurrected. That is because another worker is running the callback, and it might succeed, or fail again (and will thus be responsible for resurrecting the value). Caveats ------- - Due to limitations in the lua-resty-lrucache module, we can only detect stale values when hitting the L2 cache (shm). Therefore, a stale value retrieved from the L1 cache will produce a `hit_lvl` of `1`. This limitation is to be removed in the future if the following PR gets accepted upstream: openresty/lua-resty-lrucache#35 - It is impossible for a caller of `get()` to distinguish a resurrected value from an already stale one. The same way, it is not possible to retrieve the callback error for consumption (as it is internally logged as a WARN by `get()`). Future work on the API of this module will make such requirements possible, but will likely be breaking changes. See also -------- Fix #50 From #52
Implement a new `opts.resurrect_ttl` instance and `get()` option. When specified, the `get()` method behaves in a more resilient fashion. Context ------- We currently consider callback-returned errors (`nil, err`) to be the cause of I/O failures in the callback (i.e. failure to fetch/refresh some value due to timeout, connection refused, etc...). When the callback returns as such, we currently return `nil, err` from `get()` as well. Instead, we could serve the stale data if is it still available (and desired by the user) Changes ------- No changes in behavior when `opts.resurrect_ttl` is not specified (backwards-compatible). If it is specified, and a callback lookup is triggered and returns such a soft error (`nil, err`), we resurrect the stale value (if it is still available) for `resurrect_ttl` seconds. When `resurrect_ttl` is reached, subsequent `get()` calls will trigger the callback again, and if the callback happens to fail again, the stale value (if it is still available) will be resurrected for `resurrect_ttl` again. The cycle continues for as long as the value isn't LRU evicted during its stale-but-non-resurrected lifetime, or when the callback succeeds in producing a fresh value. To ensure backwards compatibility, callback soft errors produced when the `opts.resurrect_ttl` option is used **do not** return an error (`data, err`). Instead, the error is logged with the WARN level (similarly to shm retries errors), and we return `data, nil`. Current users of this module can therefore simply add the `resurrect_ttl` option without modifying the way return values are currently handled. When stale data is resurrected or subsequently returned by `get()`, the `hit_lvl` return value will be `4`. Lock timeouts occurring when `resurrect_ttl` is used will also log the error at the WARN level and return a `hit_lvl` of `4` (if the value is still available). However, the stale value will not be resurrected. That is because another worker is running the callback, and it might succeed, or fail again (and will thus be responsible for resurrecting the value). Caveats ------- - Due to limitations in the lua-resty-lrucache module, we can only detect stale values when hitting the L2 cache (shm). Therefore, a stale value retrieved from the L1 cache will produce a `hit_lvl` of `1`. This limitation is to be removed in the future if the following PR gets accepted upstream: openresty/lua-resty-lrucache#35 - It is impossible for a caller of `get()` to distinguish a resurrected value from an already stale one. The same way, it is not possible to retrieve the callback error for consumption (as it is internally logged as a WARN by `get()`). Future work on the API of this module will make such requirements possible, but will likely be breaking changes. See also -------- Fix #50 From #52
thibaultcha
force-pushed
the
feat/resurrect-stale
branch
from
June 29, 2018 00:58
a23e8cf
to
29c0902
Compare
thibaultcha
added a commit
to Kong/kong
that referenced
this pull request
Jun 30, 2018
This option is based on thibaultcha/lua-resty-mlcache#52 Documentation for its behaviour can be read in the description test included in kong.conf.default.
thibaultcha
added a commit
to Kong/kong
that referenced
this pull request
Jun 30, 2018
This option is based on thibaultcha/lua-resty-mlcache#52 Documentation for its behaviour can be read in the description test included in kong.conf.default.
thibaultcha
added a commit
to Kong/kong
that referenced
this pull request
Jul 5, 2018
This option is based on thibaultcha/lua-resty-mlcache#52 Documentation for its behaviour can be read in the description test included in kong.conf.default.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement a new
opts.resurrect_ttl
instance andget()
option. Whenspecified, the
get()
method behaves in a more resilient fashion.Context
We currently consider callback-returned errors (
nil, err
) to be thecause of I/O failures in the callback (i.e. failure to fetch/refresh
some value due to timeout, connection refused, etc...).
When the callback returns as such, we currently return
nil, err
fromget()
as well. Instead, we could serve the stale data if is it stillavailable (and desired by the user)
Changes
No changes in behavior when
opts.resurrect_ttl
is not specified(backwards-compatible).
If it is specified, and a callback lookup is triggered and returns such
a soft error (
nil, err
), we resurrect the stale value (if it is stillavailable) for
resurrect_ttl
seconds. Whenresurrect_ttl
is reached,subsequent
get()
calls will trigger the callback again, and if thecallback happens to fail again, the stale value (if it is still
available) will be resurrected for
resurrect_ttl
again. The cyclecontinues for as long as the value isn't LRU evicted during its
stale-but-non-resurrected lifetime, or when the callback succeeds in
producing a fresh value.
To ensure backwards compatibility, callback soft errors produced when
the
opts.resurrect_ttl
option is used do not return an error(
data, err
). Instead, the error is logged with the WARN level (similarlyto shm retries errors), and we return
data, nil
. Current users of thismodule can therefore simply add the
resurrect_ttl
option withoutmodifying the way return values are currently handled.
When stale data is resurrected or subsequently returned by
get()
, thehit_lvl
return value will be4
.Lock timeouts occurring when
resurrect_ttl
is used will also log theerror at the WARN level and return a
hit_lvl
of4
(if the value isstill available). However, the stale value will not be resurrected. That
is because another worker is running the callback, and it might succeed,
or fail again (and will thus be responsible for resurrecting the value).
Caveats
Due to limitations in the lua-resty-lrucache module, we can only detect
stale values when hitting the L2 cache (shm). Therefore, a stale value
retrieved from the L1 cache will produce a
hit_lvl
of1
. Thislimitation is to be removed in the future if the following PR gets
accepted upstream:
feature: implemented a user flags attribute. openresty/lua-resty-lrucache#35
It is impossible for a caller of
get()
to distinguish a resurrectedvalue from an already stale one. The same way, it is not possible to
retrieve the callback error for consumption (as it is internally logged
as a WARN by
get()
). Future work on the API of this module will makesuch requirements possible, but will likely be breaking changes.