Proxy: Add watchers for keys in cache #14665

biosvs · 2022-10-31T10:08:04Z

What would you like to be added?

Run watcher inside proxy for every key that is added to the cache.

(Maybe you have some better suggestions, this issue is created for architecture discussion.)

Why is this needed?

I have noticed, that proxy cache doesn't have any TTL for keys or any other options to update values other than by making direct request to the proxy node. Either to get key (with "linearizable" consistency level), or to delete, or to compact.

If key was changed by request through another proxy node or with direct request to the cluster node, stale value would stuck forever.

That is mean that if more than one proxy node in use (or direct requests to cluster nodes are possible), cache could not be used at all (all read requests should be linearizable). The only usecase for proxy in this case is watchers coalescing.

tjungblu · 2022-10-31T10:40:51Z

If key was changed by request through another proxy node or with direct request to the cluster node, stale value would stuck forever.

Just to clarify. Your setup looks like this:

graph TD;
    YourCode-->|grpc|gRPCProxy;
    gRPCProxy-->|http|etcd;

then "YourCode" writes some key/value, let's say a=1 and another actor removes that key from etcd directly?

biosvs · 2022-10-31T11:22:09Z

I have client and several proxy nodes:

graph TD;
    YourCode-->|grpc|gRPCProxy1;
    YourCode-->|grpc|gRPCProxy2;
    gRPCProxy1-->|http|etcd;
    gRPCProxy2-->|http|etcd;

Let's imagine I have key k1 with value v1 in etcd.
I read k1 from Proxy1, then read k1 from Proxy2. Both proxies have "k1 -> v1" in cache now.
Then I overwrite k1 with value v2 through Proxy1.
Now Proxy1 has actual "k1 -> v2", but Proxy2 has stale "k1 -> v1".

tjungblu · 2022-10-31T11:41:05Z

Thanks for the clarification, is the issue that you read stale values from Proxy2? Or the memory usage from the stale entries?

As far as I understand the proxy code it will only return serializable results from the cache and save the last response as such (basically a write-through cache):
https://github.com/etcd-io/etcd/blob/main/server/proxy/grpcproxy/kv.go#L40-L68

biosvs · 2022-10-31T12:08:40Z

So, yes, I think that stale reads from Proxy2 is the issue.

You're right, stale reads are possible only with "serializable" mode. And it's true for direct connection toward etcd cluster.
But eventually all the nodes of the cluster will have actual value. Which is in turn not true for proxy nodes - there is no "eventually consistency" guarantees for them. And I think it should be, otherwise the only useful read consistency mode (if 2+ proxy nodes are in use) is "linearisable", which makes cache unusable and proxy meaningless.

biosvs · 2023-01-18T12:01:57Z

Hi guys,

I looked into the code, and unfortunately it's not as easy as I wish it to be.

The main complications is that watchers (that are served client "watch" requests) deeply integrated with grpc, i.e. watchers inside grpc-proxy build around grpc streams.
It makes impossible (or more precisely - meaningless and ugly) to run watcher from inside grpc-proxy.

Thinking how cache watchers may be implemented with least refactoring as possible, I came up with the following solution. It's not perfect, it's not the most beauty one, but it should work.

How it works now

When client sends range request, it serves by kvproxy package, which tries to get value from cache (key of the cache is encoded range request, including keys, filters, etc.; cache value will be used only for serializable requests), makes request to etcd clusters (if there is no value on the cache, or request is in linearizable mode), stores new value in cache (always) and returns value to client.
LRU cache stores at most 1k values.

Proposal

Let's add new block - kv-watchers. Task of this block is to watch for the changes in the ranges that are stored in cache.
When kvproxy gets range request from client, it should run watcher for this range, and only then try to get it from cache/from cluster.
If old value was discarded from cache, kvproxy should stop watcher for this range.
When watcher gets change event, it invalidates cache for this range, and then stops itself.

Overhead: LRU cache may store at most 1k values, relation is linear, so it could be up to 1k watchers.

Why watcher invalidates cache, but not update value: because cache stores range itself, but not key or keys list. By the watcher event we know only one key, but don't know the whole response for the range request.

Following steps

I can't say for sure that I'll have enough time to implement it, but I want to at least discuss this architecture with you. Maybe one of my successor will be interested in this task, it would be helpful to have answer "how it should be done".

Case 1: range is not watched, is not in cache

sequenceDiagram

participant cli as client
participant kv as kvproxy
participant wat as watchers
participant cc as cache
participant cl as etcd cluster

Note over kv, cc: Packages inside running grpc-proxy

cli ->> kv: get range
kv ->>+ wat: run for range
wat ->> cl: watch for range
cl -->> wat: success/fail
wat -->> kv: success/fail
kv ->> cl: get range
cl -->> kv: range resp
Note over kv: start point for diagram in case 3
kv ->> cc: set range resp
cc -->> kv: ok
kv -->> cli: range resp

Case 2: New event in watcher

sequenceDiagram

participant kv as kvproxy
participant wat as watchers
participant cc as cache
participant cl as etcd cluster

cl --) wat: event
wat ->> cc: invalidate
wat ->> wat: stop watcher for range

Case 3: New range discards oldest from LRU cache

sequenceDiagram

participant kv as kvproxy
participant wat as watchers
participant cc as cache

kv ->> cc: set range resp
cc -->> kv: ok, but range X was discarded
kv ->> wat: stop watcher for X

ahrtr · 2023-01-18T21:52:26Z

It's a real issue to me.

A simpler solution is to use a cache which supports expiration, such as go-cache, and grpcproxy can expose a flag to let users to configure the expiration time. It isn't perfect, but it's simpler and it should be minor change.

biosvs · 2023-01-19T12:06:19Z

Seems good as a fast and simple decision.

Let's discuss cache lib then.

https://github.com/patrickmn/go-cache
It seems abandoned, last release in 2017, no go.mod, no LRU, etc., definitely not our choice.
https://github.com/jellydator/ttlcache
Next popular option (500 stars).
Uses generics (may be not the best option for database so far, but should be ok for the proxy)
Has embedded LRU
Active support and development
Has metrics
Has some extra features (start or not start background cleaning goroutine; events subscription; key loading from external source) - may be overkill
MIT Licence
https://github.com/go-pkgz/expirable-cache
Less users (50 stars)
Uses generics
Has embedded LRU
Active support and development
Has metrics
Doesn't have unnecessary features
MIT Licence

At this point I would recommend to use https://github.com/jellydator/ttlcache without background goroutine (values will be discarded by new values (LRU) or overwritten by new values).

ahrtr · 2023-01-19T22:30:49Z

cc @ptabor @serathius @spzala for awareness and opinions as well.

biosvs · 2023-01-20T19:35:45Z

(After looking through the code of the libs once again, I'd suggest to make custom structure for the cache.)

Hexta · 2023-02-06T22:02:27Z

Hey,
How about implementing an option for disabling the cache first?

ahrtr · 2023-02-06T22:26:56Z

Hey, How about implementing an option for disabling the cache first?

Sounds like a reasonable change. I am not sure why it isn't implemented in the first place.

So two changes:

add an option to let users to disable cache. We can add one more function something grpcproxy.NewKvProxyWithOptions to make it more generic.
we also need to resolve the stale value issue; adding a TTL seems a simpler solution.

stale · 2023-05-21T17:48:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

biosvs added the type/feature label Oct 31, 2022

This was referenced Feb 7, 2023

etcdproxy: Added flag for enabling/disabling cache. #15254

Closed

WIP: Add ttl to proxy cache #15255

Closed

stale bot added the stale label May 21, 2023

chaochn47 added stage/tracked and removed stale labels May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxy: Add watchers for keys in cache #14665

Proxy: Add watchers for keys in cache #14665

biosvs commented Oct 31, 2022

tjungblu commented Oct 31, 2022

biosvs commented Oct 31, 2022

tjungblu commented Oct 31, 2022 •

edited

Loading

biosvs commented Oct 31, 2022

biosvs commented Jan 18, 2023

ahrtr commented Jan 18, 2023

biosvs commented Jan 19, 2023 •

edited

Loading

ahrtr commented Jan 19, 2023

biosvs commented Jan 20, 2023

Hexta commented Feb 6, 2023

ahrtr commented Feb 6, 2023

stale bot commented May 21, 2023

Proxy: Add watchers for keys in cache #14665

Proxy: Add watchers for keys in cache #14665

Comments

biosvs commented Oct 31, 2022

What would you like to be added?

Why is this needed?

tjungblu commented Oct 31, 2022

biosvs commented Oct 31, 2022

tjungblu commented Oct 31, 2022 • edited Loading

biosvs commented Oct 31, 2022

biosvs commented Jan 18, 2023

How it works now

Proposal

Following steps

Case 1: range is not watched, is not in cache

ahrtr commented Jan 18, 2023

biosvs commented Jan 19, 2023 • edited Loading

ahrtr commented Jan 19, 2023

biosvs commented Jan 20, 2023

Hexta commented Feb 6, 2023

ahrtr commented Feb 6, 2023

stale bot commented May 21, 2023

tjungblu commented Oct 31, 2022 •

edited

Loading

biosvs commented Jan 19, 2023 •

edited

Loading