-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-44810 [C++][Parquet] Added arrow::Result version of parquet::arrow::FileReader::Make()
#44819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GH-44810 [C++][Parquet] Added arrow::Result version of parquet::arrow::FileReader::Make()
#44819
Conversation
|
|
|
@github-actions autotune |
|
hey @wgtmac I am new to arrow and just noticed the failures appearing in the checks. I am not sure if my changes weren't covered in unit testing or not can you please advise? |
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update codes that use parquet::arrow::FileReader::Make()?
CI failures tell us where we use it.
| static ::arrow::Status Make(::arrow::MemoryPool* pool, | ||
| std::unique_ptr<ParquetFileReader> reader, | ||
| const ArrowReaderProperties& properties, | ||
| std::unique_ptr<FileReader>* out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you keep the arrow::Status version with deprecated mark so that users will be able to migrate to the arrow::Result version smoothly?
See also: #44785
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Will add comments instead and address the comments. Thanks @kou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. We need to mark the existing ones as deprecated without deleting them.
cpp/src/parquet/arrow/schema.h
Outdated
| /// between arrow's Schema and parquet's Schema. | ||
| struct PARQUET_EXPORT SchemaManifest { | ||
| static ::arrow::Status Make( | ||
| static ::arrow::Result<std::unique_ptr<FileReader>> Make( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you work on this as a separated task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I have reverted this now.
cpp/src/parquet/arrow/writer.h
Outdated
| MemoryPool* pool, std::unique_ptr<ParquetFileWriter> writer, | ||
| std::shared_ptr<::arrow::Schema> schema, | ||
| std::shared_ptr<ArrowWriterProperties> arrow_properties, | ||
| std::unique_ptr<FileWriter>* out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you work on this as a separated task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I have reverted this now
cpp/src/parquet/bloom_filter.cc
Outdated
| static constexpr uint32_t kBloomFilterHeaderSizeGuess = 256; | ||
|
|
||
| static ::arrow::Status ValidateBloomFilterHeader( | ||
| static ::arrow::Result<std::unique_ptr<FileReader>> ValidateBloomFilterHeader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you work on this as a separated task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted. @kou Also, would be beneficial to have an issue with subtasks for this refactoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Could you create subissues?
…ake()" This reverts commit d4ee43f.
raulcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking an effort!
There are also some linting failures, please do take a look on those once the other things have been reviewed!
cpp/src/parquet/arrow/reader.cc
Outdated
| Result<std::unique_ptr<FileReader>> Make(::arrow::MemoryPool* pool, | ||
| std::unique_ptr<ParquetFileReader> reader, | ||
| const ArrowReaderProperties& properties, | ||
| std::unique_ptr<FileReader>* out) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unique_ptr<FileReader>* out should be removed as part of the signature of the function. The Result already contains the FileReader value so this is unnecessary. See this commit with a similar change for an example:
0cdbdac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @raulcd updated this but I think i got something wrong on how to use the std::unique_ptr<FileReader>* out internally in the function.
cpp/src/parquet/arrow/reader.cc
Outdated
| } | ||
|
|
||
| Status FileReader::Make(::arrow::MemoryPool* pool, | ||
| Result<std::unique_ptr<FileReader>> Make(::arrow::MemoryPool* pool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still have to provide the implementations for Status FileReader because they are part of the headers file. Even though they are deprecated you can't remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! keeping the implementations for Status FileReader
raulcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are failures on CI where some usages on other files should be updated to stop using the deprecated API, see for example arrow/cpp/src/arrow/dataset/file_parquet.cc as seen on the failure:
FAILED: src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/file_parquet.cc.o
/usr/bin/ccache /usr/lib/ccache/clang++-14 -DADDRESS_SANITIZER -DARROW_DS_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_UBSAN -DJSON_DIAGNOSTICS=0 -DJSON_USE_IMPLICIT_CONVERSIONS=1 -I/build/cpp/src -I/arrow/cpp/src -I/arrow/cpp/src/generated -I/arrow/cpp/src/parquet -isystem /build/cpp/xsimd_ep/src/xsimd_ep-install/include -isystem /build/cpp/opentelemetry_ep-install/include -Qunused-arguments -fcolor-diagnostics -Wall -Wextra -Wdocumentation -DARROW_WARN_DOCUMENTATION -Wshorten-64-to-32 -Wno-missing-braces -Wno-unused-parameter -Wno-constant-logical-operand -Wno-return-stack-address -Wdate-time -Wno-unknown-warning-option -Wno-pass-failed -msse4.2 -fsanitize=address -DADDRESS_SANITIZER -fsanitize=undefined -fno-sanitize=alignment,vptr,function,float-divide-by-zero -fno-sanitize-recover=all -fsanitize-coverage=pc-table,inline-8bit-counters,edge,no-prune,trace-cmp,trace-div,trace-gep -fsanitize-blacklist=/arrow/cpp/build-support/sanitizer-disallowed-entries.txt -g -Werror -O0 -ggdb -g1 -fPIC -fsanitize-coverage=pc-table,inline-8bit-counters,edge,no-prune,trace-cmp,trace-div,trace-gep -std=c++17 -MD -MT src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/file_parquet.cc.o -MF src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/file_parquet.cc.o.d -o src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/file_parquet.cc.o -c /arrow/cpp/src/arrow/dataset/file_parquet.cc
/arrow/cpp/src/arrow/dataset/file_parquet.cc:507:45: error: 'Make' is deprecated: Deprecated in 19.0.0. Use arrow::Result version instead. [-Werror,-Wdeprecated-declarations]
RETURN_NOT_OK(parquet::arrow::FileReader::Make(
| *out = std::make_unique<FileReaderImpl>(pool, std::move(reader), properties); | ||
| return static_cast<FileReaderImpl*>(out->get())->Init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the Result version here?
| *out = std::make_unique<FileReaderImpl>(pool, std::move(reader), properties); | |
| return static_cast<FileReaderImpl*>(out->get())->Init(); | |
| return Make(pool, std::move(reader), properties)->Value(out); |
| // Factory function to create a FileReader from a ParquetFileReader and properties. | ||
| // \deprecated Deprecated in 19.0.0. Use arrow::Result version instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use /// not // for Doxygen: https://www.doxygen.nl/manual/docblocks.html
| // Factory function to create a FileReader from a ParquetFileReader and properties. | |
| // \deprecated Deprecated in 19.0.0. Use arrow::Result version instead. | |
| /// Factory function to create a FileReader from a ParquetFileReader and properties. | |
| /// \deprecated Deprecated in 19.0.0. Use arrow::Result version instead. |
|
|
||
| /// Factory function to create a FileReader from a ParquetFileReader | ||
| // Factory function to create a FileReader from a ParquetFileReader and properties | ||
| // Returns an arrow::Result containing a unique pointer to the FileReader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use \return Doxygen markup?
https://www.doxygen.nl/manual/commands.html#cmdreturn
|
Just want to echo the comment from @raulcd that we need to fix all source files not to call the deprecated functions. Perhaps we need to run a full C++ build to make sure they are all covered. |
|
@malinjawi Do you still want to complete this PR? |
Rationale for this change
This PR refactors
parquet::arrow::FileReader::Make()and related functions to returnarrow::Resultinstead of::arrow::Status. This modernizes the error handling and improves consistency across the codebase.What changes are included in this PR?
Replaced :
:arrow::Statuswitharrow::Result<std::unique_ptr<FileReader>>in the Make() functions under CPP Parquet.Are these changes tested?
All unit tests pass.

Are there any user-facing changes?
No user-facing changes; this is an internal refactor.
arrow::Resultversion ofparquet::arrow::FileReader::Make()#44810