Skip to content

Flink 2.0: Use LRUCache for Iceberg table metadata #13382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aiborodin
Copy link
Contributor

We recently discovered that LRUCache, based on LinkedHashMap, has a throughput almost two times as high as the Caffeine Cache with the maximum size configured. Please, see JMH benchmark results here.

Let's use LRUCache in TableMetadataCache to improve the cache performance of the DynamicIcebergSink.

@github-actions github-actions bot added the flink label Jun 25, 2025
@manuzhang manuzhang changed the title Use LRUCache for Iceberg table metadata Flink 2.0: Use LRUCache for Iceberg table metadata Jun 25, 2025
@aiborodin aiborodin force-pushed the use-lru-cache-for-table-metadata branch from 715be14 to 0689560 Compare June 25, 2025 09:54
@ben-manes
Copy link

Is TableMetadataCache single threaded like your benchmark? If so then this is what LinkedHashMap is optimal for since the LRU updates are simple pointer swaps. If not, then you will have to synchronize access because it is not thread-safe, where concurrent usage will cause corruption and instability. This is required for every read (per the javadoc) since every access mutates the LRU order. You also have to be thoughtful about the cache hit rate because a cache miss will be far more expensive than the in-memory operations, e.g. it will have to perform I/O. If the workload is recency-biased then LRU is perfect, but if it has frequency or scans then it can be quite poor.

Caffeine is a multi-threaded cache with an adaptive eviction policy that maximizes the hit rates based on the observed workload. This does incur additional overhead but can greatly improve the overall system performance.

I adjusted Caffeine's benchmark to run as a single threaded and with 16 threads on my 14-core M3 MAX laptop using OpenJDK 24. This uses a Zipfian distribution to simulate hot/cold items with a 100% hit rate. This was not a clean system as I was on a conference call while writing code, which only hurt Caffeine since it will utilize all of the cores. In a cloud environment you will likely observe worse throughput due to virtualization, numa effects, noisy neighbors, older hardware, etc. In general you want to drive application performance decision by profiling to resolve hotspots, as small optimizations can backfire when your benchmark does not fit your real workload.

Screenshot 2025-06-25 at 8 07 06 PM

@aiborodin
Copy link
Contributor Author

Thank you for a detailed reply.

TableMetadataCache is single-threaded, so it's safe to use LRUCache, based on LinkedHashMap.

It was unclear from the caffeine project description that the cache is specifically optimised for high concurrency. I also didn't find any single-threaded benchmarks online, so I wrote my own here which gave these results:

Benchmark                                     Mode  Cnt     Score     Error  Units
CacheBenchmark.testCaffeineCacheMaxSize_Get  thrpt    5   929.031 ±  84.510  ops/s
CacheBenchmark.testCaffeineCacheMaxSize_Put  thrpt    5   548.677 ±  12.191  ops/s
CacheBenchmark.testLRUCache_Get              thrpt    5  1657.313 ±  71.981  ops/s
CacheBenchmark.testLRUCache_Put              thrpt    5  1206.151 ± 112.609  ops/s

I guess it makes sense that caffeine performs worse in a single-threaded scenario due to thread synchronisation. It might be worth clarifying this on the project page so it's more visible for users.

@ben-manes
Copy link

Yep, yours are reasonable but you don’t need to have the loop and can let jmh handle it for better clarity of the results. Those are 100k calls per unit.

It’s not that much worse as 44M vs 76M reads/s is far faster than typically needed. Usually those who care need primitive collections and are very specialized. The much better hit rate than LRU more than compensates because milliseconds for an extra miss outweighs saving a few nanoseconds per hit. LHM as an LRU is really great, but LRU isn’t as good as people assume.

@pvary
Copy link
Contributor

pvary commented Jun 26, 2025

Thanks for the PR @aiborodin, and @ben-manes for the nice detailed test and explanation. We are debating sharing the TableCache on JVM level. Your highlights about the concurrent cache access will be a very useful data point in that discussion.

Currently the cache access is single threaded, so we can get away with the suggested LHM solution.

We recently discovered that LRUCache, based on LinkedHashMap, performs
almost twice as fast as the Caffein max size cache. Let's replace the
caffeine cache to optimise the performance.
@aiborodin aiborodin force-pushed the use-lru-cache-for-table-metadata branch from 0689560 to d430d54 Compare June 27, 2025 07:22
@aiborodin
Copy link
Contributor Author

Thanks for your comment @pvary. I rebased this change on top of the merged #13340, so this should be ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants