Skip to content

Support applying parquet bloom filters to StringView columns #12499

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Part of #11752

While working to enable StringView in #12092 I found that the columns when read as StringView and BinaryView do not take advantage of Bloom filters.

Specifically this code doesn't handle StringView

ScalarValue::Utf8(Some(v)) => sbbf.check(&v.as_str()),
ScalarValue::Binary(Some(v)) => sbbf.check(v),
ScalarValue::FixedSizeBinary(_size, Some(v)) => sbbf.check(v),
ScalarValue::Boolean(Some(v)) => sbbf.check(v),
ScalarValue::Float64(Some(v)) => sbbf.check(v),
ScalarValue::Float32(Some(v)) => sbbf.check(v),

Describe the solution you'd like

Support applying parquet bloom filters to StringView columns

Describe alternatives you've considered

Basically:

  1. Make the code changes for bloom filters in Enable reading StringViewArray by default from Parquet #12092
  2. Write a test

In terms of testing, I think the easiest thing to do would be to follow the model of the existing tests for Utf8/Binary columns and pass the schema_force_view_types config flag

Additional context

No response

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions