kafka: bias fetch planner to prefer reading partitions with data in SI cache

When a client issues a fetch RPC that touches many partitions, the kafka layer makes a decision about which partitions to include in the result.  Currently this decision does not take account of which partitions may have fetch results returned cheaply from the SI cache, vs. which may require an expensive segment hydration.

Fetches should advance somewhat uniformly across all the partitions they are interested in, but within some bounds we may bias this: for example, as long as no partition is more than 1 segment size ahead of the others, we can continue to return results from the partitions we happen to have data handy for.

Without this change, if the segment_size * partition_count > cache_size, then even one consumer reading a topic may experience very bad thrashing of segments in/out of the cache.

This change will make no difference in the case of 1 consumer per partition, but in the case of 1 consumer reading many partitions under cache space pressure, it will be far more efficient.

JIRA Link: [CORE-1041](https://redpandadata.atlassian.net/browse/CORE-1041)

[CORE-1041]: https://redpandadata.atlassian.net/browse/CORE-1041?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kafka: bias fetch planner to prefer reading partitions with data in SI cache #6681

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development