Skip to content

Conversation

@malinjawi
Copy link
Contributor

@malinjawi malinjawi commented Nov 23, 2024

Rationale for this change

This PR refactors parquet::arrow::FileReader::Make() and related functions to return arrow::Result instead of ::arrow::Status. This modernizes the error handling and improves consistency across the codebase.

What changes are included in this PR?

Replaced ::arrow::Status with arrow::Result<std::unique_ptr<FileReader>> in the Make() functions under CPP Parquet.

Are these changes tested?

All unit tests pass.
image

Are there any user-facing changes?

No user-facing changes; this is an internal refactor.

@malinjawi malinjawi requested a review from wgtmac as a code owner November 23, 2024 18:35
@github-actions
Copy link

⚠️ GitHub issue #44810 has been automatically assigned in GitHub to PR creator.

@malinjawi
Copy link
Contributor Author

@github-actions autotune

@github-actions github-actions bot added the awaiting review Awaiting review label Nov 23, 2024
@malinjawi
Copy link
Contributor Author

hey @wgtmac I am new to arrow and just noticed the failures appearing in the checks. I am not sure if my changes weren't covered in unit testing or not can you please advise?

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update codes that use parquet::arrow::FileReader::Make()?
CI failures tell us where we use it.

Comment on lines 119 to 122
static ::arrow::Status Make(::arrow::MemoryPool* pool,
std::unique_ptr<ParquetFileReader> reader,
const ArrowReaderProperties& properties,
std::unique_ptr<FileReader>* out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you keep the arrow::Status version with deprecated mark so that users will be able to migrate to the arrow::Result version smoothly?
See also: #44785

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will add comments instead and address the comments. Thanks @kou

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. We need to mark the existing ones as deprecated without deleting them.

/// between arrow's Schema and parquet's Schema.
struct PARQUET_EXPORT SchemaManifest {
static ::arrow::Status Make(
static ::arrow::Result<std::unique_ptr<FileReader>> Make(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you work on this as a separated task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I have reverted this now.

MemoryPool* pool, std::unique_ptr<ParquetFileWriter> writer,
std::shared_ptr<::arrow::Schema> schema,
std::shared_ptr<ArrowWriterProperties> arrow_properties,
std::unique_ptr<FileWriter>* out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you work on this as a separated task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I have reverted this now

static constexpr uint32_t kBloomFilterHeaderSizeGuess = 256;

static ::arrow::Status ValidateBloomFilterHeader(
static ::arrow::Result<std::unique_ptr<FileReader>> ValidateBloomFilterHeader(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you work on this as a separated task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted. @kou Also, would be beneficial to have an issue with subtasks for this refactoring?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Could you create subissues?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Nov 24, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 24, 2024
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking an effort!
There are also some linting failures, please do take a look on those once the other things have been reviewed!

Comment on lines 1318 to 1321
Result<std::unique_ptr<FileReader>> Make(::arrow::MemoryPool* pool,
std::unique_ptr<ParquetFileReader> reader,
const ArrowReaderProperties& properties,
std::unique_ptr<FileReader>* out) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::unique_ptr<FileReader>* out should be removed as part of the signature of the function. The Result already contains the FileReader value so this is unnecessary. See this commit with a similar change for an example:
0cdbdac

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @raulcd updated this but I think i got something wrong on how to use the std::unique_ptr<FileReader>* out internally in the function.

}

Status FileReader::Make(::arrow::MemoryPool* pool,
Result<std::unique_ptr<FileReader>> Make(::arrow::MemoryPool* pool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have to provide the implementations for Status FileReader because they are part of the headers file. Even though they are deprecated you can't remove them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! keeping the implementations for Status FileReader

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Nov 25, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 2, 2024
@malinjawi malinjawi requested review from kou and raulcd December 2, 2024 15:56
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are failures on CI where some usages on other files should be updated to stop using the deprecated API, see for example arrow/cpp/src/arrow/dataset/file_parquet.cc as seen on the failure:

 FAILED: src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/file_parquet.cc.o 
/usr/bin/ccache /usr/lib/ccache/clang++-14 -DADDRESS_SANITIZER -DARROW_DS_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_UBSAN -DJSON_DIAGNOSTICS=0 -DJSON_USE_IMPLICIT_CONVERSIONS=1 -I/build/cpp/src -I/arrow/cpp/src -I/arrow/cpp/src/generated -I/arrow/cpp/src/parquet -isystem /build/cpp/xsimd_ep/src/xsimd_ep-install/include -isystem /build/cpp/opentelemetry_ep-install/include -Qunused-arguments -fcolor-diagnostics  -Wall -Wextra -Wdocumentation -DARROW_WARN_DOCUMENTATION -Wshorten-64-to-32 -Wno-missing-braces -Wno-unused-parameter -Wno-constant-logical-operand -Wno-return-stack-address -Wdate-time -Wno-unknown-warning-option -Wno-pass-failed -msse4.2  -fsanitize=address -DADDRESS_SANITIZER -fsanitize=undefined -fno-sanitize=alignment,vptr,function,float-divide-by-zero -fno-sanitize-recover=all -fsanitize-coverage=pc-table,inline-8bit-counters,edge,no-prune,trace-cmp,trace-div,trace-gep -fsanitize-blacklist=/arrow/cpp/build-support/sanitizer-disallowed-entries.txt -g -Werror -O0 -ggdb -g1 -fPIC   -fsanitize-coverage=pc-table,inline-8bit-counters,edge,no-prune,trace-cmp,trace-div,trace-gep -std=c++17 -MD -MT src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/file_parquet.cc.o -MF src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/file_parquet.cc.o.d -o src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/file_parquet.cc.o -c /arrow/cpp/src/arrow/dataset/file_parquet.cc
/arrow/cpp/src/arrow/dataset/file_parquet.cc:507:45: error: 'Make' is deprecated: Deprecated in 19.0.0. Use arrow::Result version instead. [-Werror,-Wdeprecated-declarations]
  RETURN_NOT_OK(parquet::arrow::FileReader::Make(

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Dec 2, 2024
Comment on lines 1321 to 1322
*out = std::make_unique<FileReaderImpl>(pool, std::move(reader), properties);
return static_cast<FileReaderImpl*>(out->get())->Init();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the Result version here?

Suggested change
*out = std::make_unique<FileReaderImpl>(pool, std::move(reader), properties);
return static_cast<FileReaderImpl*>(out->get())->Init();
return Make(pool, std::move(reader), properties)->Value(out);

Comment on lines +118 to +119
// Factory function to create a FileReader from a ParquetFileReader and properties.
// \deprecated Deprecated in 19.0.0. Use arrow::Result version instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use /// not // for Doxygen: https://www.doxygen.nl/manual/docblocks.html

Suggested change
// Factory function to create a FileReader from a ParquetFileReader and properties.
// \deprecated Deprecated in 19.0.0. Use arrow::Result version instead.
/// Factory function to create a FileReader from a ParquetFileReader and properties.
/// \deprecated Deprecated in 19.0.0. Use arrow::Result version instead.


/// Factory function to create a FileReader from a ParquetFileReader
// Factory function to create a FileReader from a ParquetFileReader and properties
// Returns an arrow::Result containing a unique pointer to the FileReader.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use \return Doxygen markup?
https://www.doxygen.nl/manual/commands.html#cmdreturn

@wgtmac
Copy link
Member

wgtmac commented Dec 4, 2024

Just want to echo the comment from @raulcd that we need to fix all source files not to call the deprecated functions. Perhaps we need to run a full C++ build to make sure they are all covered.

@kou
Copy link
Member

kou commented Nov 17, 2025

@malinjawi Do you still want to complete this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants