-
Notifications
You must be signed in to change notification settings - Fork 28
feat: Rework tenant cache #1601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In order to have more resilience in the face of outages, implement an out-of-process cache using memcached. In order to support the existing mode of operation, add an in-process-cache using Otter. Rework the tenant cache to use this new system. Signed-off-by: Marcelo E. Magallon <marcelo.magallon@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bit that needs the most attention.
| // | ||
| // This interface allows the agent to use different cache backends interchangeably, | ||
| // including a no-op implementation that eliminates the need for nil checks in client code. | ||
| type Cache interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three implementations of this interface.
A memcached one. This is the one I would like to use eventually.
A local one, based on Otter.
A noop one, which is a fallback in case the other two are not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider a multi-tiered implementation, too, using L1 with Otter and L2 with memcached. I would prefer to get some numbers before going that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I initially expected a multi-tiered implementation. It's not something we need to implement immediately, I'm fine with running this using one or the other first and getting some data out of it.
| type Cache interface { | ||
| // Set stores a value in the cache with the specified expiration time. | ||
| // Returns an error if the operation fails. | ||
| Set(ctx context.Context, key string, value any, expiration time.Duration) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bit where I struggled the most.
Otter uses generics to have a more strongly-typed cache. Since the goal is to use memcached, which does not have a strongly-typed API, I thought about multiple clients, one per type, but it was weird.
|
Apologies about the large PR. I did consider breaking it up into multiple steps, but it's hard to see where this is going without the changes in the tenant manager code, and it's hard to make the tenant manager changes without the other code. |
The-9880
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it'll work to me, I left a suggestion about typed caches which I think could be interesting.
| effectiveType := cacheType | ||
| if cacheType == "auto" { | ||
| if len(memcachedServers) > 0 { | ||
| effectiveType = "memcached" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may make sense as a typed string in the cache package, so the valid values are defined upfront and we don't need to handle the literals.
e.g:
package cache
type Type string
const (
TypeMemcached Type = "memcached"
TypeLocal Type = "local"
TypeNoop Type = "noop"
)
|
|
||
| ctx := t.Context() | ||
|
|
||
| t.Run("flush clears all items", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is t.Run() here just to visually separate the setup? Or to annotate the test? I don't see a big advantage to it otherwise as there's only one subtest.
| func TestLocalCacheCapacity(t *testing.T) { | ||
| logger := zerolog.New(io.Discard) | ||
|
|
||
| t.Run("respects max capacity", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar q, why bother with t.Run() in this test?
|
|
||
| // Check if there's already a cached tenant | ||
| var existing cachedTenant | ||
| err := tm.cache.Get(ctx, key, &existing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to log this if it's non-nil?
|
|
||
| // Store in cache without TTL - we'll check ValidUntil manually | ||
| // This allows us to return stale data if the API is unavailable | ||
| if err := tm.cache.Set(ctx, key, cached, 0); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about defining typed caches on top of the Cache interface?
The assumption being that items of the same object class (e.g. cachedTenant) would be cached with the same TTL (0 in this case). The counter-case would be if you ever wanted to have different TTLs for objects of the same kind, e.g. using validUntil as the TTL for cachedTenant.
This lets us omit expiration from Set by constructing a Cache<cachedTenant> which uses 0 for all its entries. You could also move the key namespace prefix (tenant:) to the typed cache's Set logic via its constructor.
So NewManager would accept a CacheProvider as an argument, and would initialize its own cache interface like:
tenantCache := cacheProvider.NewCache<cachedTenant>(namespace="tenant:", TTL=0)
tm := &Manager{
tenantCh: tenantCh,
tenantsClient: tenantsClient,
timeout: timeout,
cache: tenantCache,
logger: logger,
fetchMutexes: xsync.NewMap[int64, *sync.Mutex](),
}
If we end up caching other objects, they would initialize a separate Cache<otherObject> with its own TTL and key prefix.
|
|
||
| // Store in cache without TTL - we'll check ValidUntil manually | ||
| // This allows us to return stale data if the API is unavailable | ||
| if err := tm.cache.Set(ctx, key, cached, 0); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note: it's documented on the implementations, but 0 means something different between caching backends:
Local: max of24hTTL, ifconfig.DefaultTTLis not defined/valid.Memcached: as long as the memcached server is alive (no explicit expiry).
It makes sense, but it did trip me for a moment.
| // | ||
| // This interface allows the agent to use different cache backends interchangeably, | ||
| // including a no-op implementation that eliminates the need for nil checks in client code. | ||
| type Cache interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I initially expected a multi-tiered implementation. It's not something we need to implement immediately, I'm fine with running this using one or the other first and getting some data out of it.
In order to have more resilience in the face of outages, implement an out-of-process cache using memcached. In order to support the existing mode of operation, add an in-process-cache using Otter. Rework the tenant cache to use this new system.