Skip to content

[Parquet] predicate cache over reports "cache read" metrics in some cases #8307

@alamb

Description

@alamb

Describe the bug

I am trying to show that when we disable the cache by setting the max size to zero, the cache doesn't pull get used. To do this I was using the records_read_from_cache metric

To my surprise, it reported rows being read from the cache even when the cache was disabled

I found that the metric reports rows that were read from the "local" cache in addition to the actual global cache:

Specifically, in this code:
https://github.com/apache/arrow-rs/blob/main/parquet/src/arrow/array_reader/cached_array_reader.rs#L202-L235

Image

To Reproduce
I set the max_predicate_cache_size size to zero (disable the cache), and the looked at this metric:

https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/metrics/enum.ArrowReaderMetrics.html#method.records_read_from_cache

Expected behavior
I expect only rows read from the "global cache" (that may consume memory) to be included in the metrics

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions