Skip to content

Commit 0969975

Browse files
westonpaceraulcd
authored andcommitted
GH-40068: [C++] Possible data race when reading metadata of a parquet file (#40111)
### Rationale for this change The `ParquetFileFragment` will cache the parquet metadata when loading it. The `metadata()` method accesses this metadata (a shared_ptr) but does not grab the lock used to set that shared_ptr. It's possible then that we are reading a shared_ptr at the same time some other thread is setting the shared_ptr which is technically (I think) undefined behavior. ### What changes are included in this PR? Guard access to the metadata by grabbing the mutex first ### Are these changes tested? Existing tests should regress this change ### Are there any user-facing changes? No * Closes: #40068 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
1 parent 5cd4f07 commit 0969975

File tree

2 files changed

+6
-1
lines changed

2 files changed

+6
-1
lines changed

cpp/src/arrow/dataset/file_parquet.cc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -779,6 +779,11 @@ ParquetFileFragment::ParquetFileFragment(FileSource source,
779779
parquet_format_(checked_cast<ParquetFileFormat&>(*format_)),
780780
row_groups_(std::move(row_groups)) {}
781781

782+
std::shared_ptr<parquet::FileMetaData> ParquetFileFragment::metadata() {
783+
auto lock = physical_schema_mutex_.Lock();
784+
return metadata_;
785+
}
786+
782787
Status ParquetFileFragment::EnsureCompleteMetadata(parquet::arrow::FileReader* reader) {
783788
auto lock = physical_schema_mutex_.Lock();
784789
if (metadata_ != nullptr) {

cpp/src/arrow/dataset/file_parquet.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ class ARROW_DS_EXPORT ParquetFileFragment : public FileFragment {
165165
}
166166

167167
/// \brief Return the FileMetaData associated with this fragment.
168-
const std::shared_ptr<parquet::FileMetaData>& metadata() const { return metadata_; }
168+
std::shared_ptr<parquet::FileMetaData> metadata();
169169

170170
/// \brief Ensure this fragment's FileMetaData is in memory.
171171
Status EnsureCompleteMetadata(parquet::arrow::FileReader* reader = NULLPTR);

0 commit comments

Comments
 (0)