Skip to content

[Bug]: Add LRU cache eviction to CombinePerKeyPrecombineOperator #37465

@junaiddshaukat

Description

@junaiddshaukat

What happened?

What needs to happen?

The CombinePerKeyPrecombineOperator in the TypeScript SDK currently uses a suboptimal cache eviction strategy. When the cache exceeds maxKeys, it flushes entries in insertion order rather than based on actual usage patterns.

Current behavior (lines 485-488 in operators.ts):
if (this.groups.size > this.maxKeys) {
// Flush a random 10% of the map to make more room.
// TODO: Tune this, or better use LRU or ARC for this cache.
return this.flush(this.maxKeys * 0.9);
}

Proposed change:

Implement LRU (Least Recently Used) cache eviction, similar to what was done for CachingStateProvider in PR #37214. This ensures that frequently accessed keys are kept in cache while infrequently used keys are evicted first.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions