-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add native format writer to access StarRocks data bypass BE Server. #52700
[Feature] Add native format writer to access StarRocks data bypass BE Server. #52700
Conversation
300bdd1
to
1c009e6
Compare
127475c
to
69e0a59
Compare
69e0a59
to
d6f19f3
Compare
e662ef3
to
45d87f8
Compare
be/CMakeLists.txt
Outdated
@@ -968,9 +968,15 @@ endif() | |||
|
|||
# Add all external dependencies. They should come after the starrocks libs. | |||
|
|||
set(STARROCKS_STD_DEPENDENCIES | |||
-static-libstdc++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add this? these will be add later
if (NOT ("${MAKE_TEST}" STREQUAL "ON" AND "${BUILD_FOR_SANITIZE}" STREQUAL "ON"))
# In other words, turn to dynamic link when MAKE_TEST and BUILD_TYPE == *SAN
# otherwise do static link gcc's lib
set(STARROCKS_LINK_LIBS ${STARROCKS_LINK_LIBS} -static-libstdc++ -static-libgcc)
endif()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have removed it
@@ -0,0 +1 @@ | |||
# StarRocks Format Library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As planned, I will write a comprehensive document here in the next PR, including the use of Writer
and Reader
.
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
FILE(GLOB CONVERTER_SRC_FILES "*.cpp" "*.h") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall GLOB will affect incremental build, better list files explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good advice, learned a lot.
constexpr int32_t ARROW_CONVERTER_ID(arrow::Type::type arrow_type_id, starrocks::LogicalType sr_logical_type) { | ||
return (arrow_type_id << 16) | (sr_logical_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some assert/DCHECK to check overflow
std::shared_ptr<starrocks::Column> ColumnConverter::get_data_column( | ||
const std::shared_ptr<starrocks::Column> &column) { | ||
if (column->is_nullable()) { | ||
auto *nullable_column = down_cast<const NullableColumn *>(column.get()); | ||
return nullable_column->data_column(); | ||
} else if (column->is_constant()) { | ||
auto *const_column = down_cast<const ConstColumn *>(column.get()); | ||
return const_column->data_column(); | ||
} else { | ||
return column; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a static method?
ARROW_ASSIGN_OR_RAISE( | ||
null_bitmap, arrow::internal::BytesToBits( | ||
reinterpret_cast<const std::vector<uint8_t> &>(null_bytes), | ||
const_cast<arrow::MemoryPool *>(_pool)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why make _pool const, then use const_cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make _pool
as const is mainly to prevent it from being changed in the column converter, but the param pool
received by the BytesToBits
method is a non-const type, so a const_cast
conversion is done as a last resort.
virtual arrow::Status open() = 0; | ||
|
||
virtual arrow::Status write(const struct ArrowArray *c_arrow_array) = 0; | ||
|
||
virtual arrow::Status flush() = 0; | ||
|
||
virtual arrow::Status finish() = 0; | ||
|
||
virtual void close() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment document about how this class should be used, open/write/flush/finish order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, good suggention
return JNI_ERR; | ||
} | ||
|
||
kNativeOptExceptionClass = find_class(env, "Lcom/starrocks/format/jni/NativeOperateException;"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought FindClass does not need Lxxx;
in "Lcom/starrocks/format/jni/NativeOperateException;"
, should you double confirm this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the Oracle JNI docs, and the Lxxx
here is not needed. I have removed it.
d1b9dec
to
9e6aa3a
Compare
… Server. Signed-off-by: plotor <zhenchao.wang@hotmail.com>
9e6aa3a
to
388ef6d
Compare
|
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 14 / 15 (93.33%) file detail
|
Why I'm doing:
You can learn about the background from PR #52682.
What I'm doing:
In order to support bypassing the BE server, and directly reading and writing the data stored in the object storage in share-data mode, we extracted a
format-sdk
module. This SDK:Reader
andWriter
implementation, which can read and write StarRocks data files.Reader
andWriter
(invokes the C++Reader
andWriter
through JNI), which can be used by computing engines.This PR mainly contains the implementation of
Writer
. You can build it by:This will complete the compilation of C++ and Java codes, and finally generate the corresponding
so
andjar
files. Then, you can call the SDK to perform data writing operations as follows (taking Java as an example):Fixes: #52682
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: