Skip to content

[C++] Possible data race when reading metadata of a parquet file #40068

@mpimenov

Description

@mpimenov

Describe the bug, including details regarding any error messages, version, and platform.

The first and the last line of this block of code access the same metadata variable but only one of them does so holding a lock.
I assume this means the other one should too.
There are some other places in this file that access metadata in tricky ways (e.g. it is not clear from a first glance at a method whether nullptr is allowed or not). They could also race.

if (parquet_fragment->metadata() != nullptr) {
ARROW_ASSIGN_OR_RAISE(row_groups, parquet_fragment->FilterRowGroups(options->filter));
pre_filtered = true;
if (row_groups.empty()) return MakeEmptyGenerator<std::shared_ptr<RecordBatch>>();
}
// Open the reader and pay the real IO cost.
auto make_generator =
[this, options, parquet_fragment, pre_filtered,
row_groups](const std::shared_ptr<parquet::arrow::FileReader>& reader) mutable
-> Result<RecordBatchGenerator> {
// Ensure that parquet_fragment has FileMetaData
RETURN_NOT_OK(parquet_fragment->EnsureCompleteMetadata(reader.get()));

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions