Skip to content

feat: limit verdict cache size #239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 8, 2025
Merged

Conversation

keejon
Copy link
Contributor

@keejon keejon commented Apr 3, 2025

Limits the VerdictCache size by removing unused keys (last access older than x minutes) and capping the total cache size to a percentage of the heap, preventing OOM. The cache size is estimated rather than precisely measured to avoid performance overhead.

@keejon keejon requested a review from a team as a code owner April 3, 2025 07:25
@keejon keejon force-pushed the keejon/limit-verdict-cache-size branch from 7afd7fd to 6bcb229 Compare April 3, 2025 07:26
@keejon keejon marked this pull request as draft April 3, 2025 07:27
@ivanyu
Copy link
Contributor

ivanyu commented Apr 3, 2025

As big numbers of transactions make the verdict cache problematic, limiting the cache size directly will make the same big streams of transaction flush out cached verdicts about other types of entities (like e.g. topics). We probably should consider separating the cache for transaction and making it bound to solve the original problem and prevent the new one.

@keejon
Copy link
Contributor Author

keejon commented Apr 3, 2025

As big numbers of transactions make the verdict cache problematic, limiting the cache size directly will make the same big streams of transaction flush out cached verdicts about other types of entities (like e.g. topics). We probably should consider separating the cache for transaction and making it bound to solve the original problem and prevent the new one.

Although caffeine will invalidate least frequently used keys so shouldn't the transactions be the first keys that are flushed out? They should also be the ones that are invalidated by expireAfterAccess.

@keejon keejon force-pushed the keejon/limit-verdict-cache-size branch 10 times, most recently from e5ed435 to 3f63965 Compare April 3, 2025 13:41
@keejon keejon force-pushed the keejon/limit-verdict-cache-size branch from 3f63965 to 40f59f1 Compare April 3, 2025 15:06
@keejon keejon marked this pull request as ready for review April 4, 2025 06:41
@ivanyu
Copy link
Contributor

ivanyu commented Apr 4, 2025

That's true, but I'm afraid it's possible that new transaction will first flush out stuff from the cache, before sooner be replaced by newer transactions. I.e. we'll get cache thrashing

@keejon
Copy link
Contributor Author

keejon commented Apr 4, 2025

That's true, but I'm afraid it's possible that new transaction will first flush out stuff from the cache, before sooner be replaced by newer transactions. I.e. we'll get cache thrashing

Hm what we could maybe also do is use the same cache instance but have a short expire on transaction keys. like 60-120 seconds maybe?

@ivanyu
Copy link
Contributor

ivanyu commented Apr 4, 2025

I don't see how it could help... newly created transactions will still be removing more stable entities, regardless of their TTL

@tvainika
Copy link
Contributor

tvainika commented Apr 7, 2025

Caffeine cache is using more accurate cache flushing than simple LRU, so I don't think cache trashing becomes an issue unless cache size is really small. Actually I wonder if time based expiry is even needed at all? But anyway in my opinion this can go in as is now.

@keejon
Copy link
Contributor Author

keejon commented Apr 7, 2025

I don't see how it could help... newly created transactions will still be removing more stable entities, regardless of their TTL

I would think that it is just less likely that this will ever happen since transactions most likely take up most space but are expired and evicted at a faster and more appropriate rate. This means it is very unlikely that it will ever hit cache size limit and thus we don't have to worry as much about it. IMHO if we have such high rate of transactions that low expire of ~120 seconds is not enough the cluster will probably already be in trouble.

Caffeine cache is using more accurate cache flushing than simple LRU, so I don't think cache trashing becomes an issue unless cache size is really small. Actually I wonder if time based expiry is even needed at all? But anyway in my opinion this can go in as is now.

Instead of getting rid of expiry i would actually rather just do TTL and get rid of cache size limit for the reasons above :-)

Copy link
Contributor

@tvainika tvainika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tvainika tvainika merged commit e14ff6f into main Apr 8, 2025
4 checks passed
@tvainika tvainika deleted the keejon/limit-verdict-cache-size branch April 8, 2025 10:04
@keejon keejon mentioned this pull request Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants