-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Stream load support JSON to STRUCT/MAP/ARRAY #45406
Conversation
c50c31d
to
6fe62fc
Compare
@Mergifyio rebase |
✅ Branch has been successfully rebased |
08a04c8
to
5153047
Compare
be/src/exec/json_scanner.cpp
Outdated
@@ -194,6 +194,12 @@ Status JsonScanner::_construct_json_types() { | |||
break; | |||
} | |||
|
|||
case TYPE_STRUCT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it support nested Array with struct or map as it's subfield data type?
It seems that it will treat all other types as varchar in TYPE_ARRAY in above code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. The nested types are supported now.
|
||
auto field_column = struct_column->field_column(field_name); | ||
simdjson::ondemand::value field_value = obj.find_field_unordered(field_name); | ||
RETURN_IF_ERROR(add_nullable_column(field_column.get(), field_type_desc, name, &field_value, true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name OR field_name ?
What is the name
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name
is the column name. It's used to report error.
{ | ||
// This is a tricky way to transform a std::string to simdjson:ondemand:value | ||
std::string_view field_name_str = field.unescaped_key(); | ||
auto dummy_json = simdjson::padded_string(R"({"dummy_key": ")" + std::string(field_name_str) + R"("})"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the tricy way? Does the field_key need to be a JSON value rather than a simple string, so you parse a dummy_json to get the key? Why not implemet a add_nullable_column with string value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Since the last argument ofadd_nullable_column
is simdjson::ondemand::value
, we have to convert the simple string to simdjson::ondemand::value
.
There're many functions using the simdjson::ondemand::value
as the input argument, implementing a add_nullable_column
with string/float/double/int value will involve massive changes.
auto map_column = down_cast<MapColumn*>(column); | ||
|
||
try { | ||
simdjson::ondemand::object obj = value->get_object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if value is not an object, what will be the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An exception The JSON element does not have the requested type.
will be thrown and be caught later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give more detailed info, such as the expected type is xxx, the real given type is yyy. It will be much clearer.
|
||
namespace starrocks { | ||
|
||
Status add_map_column(Column* column, const TypeDescriptor& type_desc, const std::string& name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about extracting array processing code to add_array_column
?
@@ -111,4 +112,34 @@ TEST_F(AddNullableColumnTest, add_null_numeric_array) { | |||
column->check_or_die(); | |||
} | |||
|
|||
TEST_F(AddNullableColumnTest, test_add_struct) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about nested Struct/map/array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are supported. I've added more information in the PR description.
const auto& field_name = type_desc.field_names[i]; | ||
const auto& field_type_desc = type_desc.children[i]; | ||
|
||
auto field_column = struct_column->field_column(field_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it case sensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If the struct field name has a different case than the JSON field name, the struct field value will be a NULL.
e5169ac
to
eeb8056
Compare
@Mergifyio rebase |
Signed-off-by: ricky <rickif@qq.com>
Signed-off-by: ricky <rickif@qq.com>
Signed-off-by: ricky <rickif@qq.com>
Signed-off-by: ricky <rickif@qq.com>
Signed-off-by: ricky <rickif@qq.com>
Signed-off-by: ricky <rickif@qq.com>
Signed-off-by: ricky <rickif@qq.com>
Signed-off-by: ricky <rickif@qq.com>
Signed-off-by: ricky <rickif@qq.com>
Co-authored-by: wyb <wybb86@gmail.com> Signed-off-by: ricky <rickif@qq.com>
Co-authored-by: wyb <wybb86@gmail.com> Signed-off-by: ricky <rickif@qq.com>
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 118 / 134 (88.06%) file detail
|
https://github.com/Mergifyio backport branch-3.3 |
✅ Backports have been created
|
Signed-off-by: ricky <rickif@qq.com> Co-authored-by: wyb <wybb86@gmail.com> (cherry picked from commit a4f53d3)
…) (#46448) Co-authored-by: ricky <rickif@qq.com>
https://github.com/Mergifyio backport branch-3.2 |
✅ Backports have been created
|
Signed-off-by: ricky <rickif@qq.com> Co-authored-by: wyb <wybb86@gmail.com> (cherry picked from commit a4f53d3)
…) (#47233) Co-authored-by: ricky <rickif@qq.com>
Why I'm doing:
Now, the SR does't support loading JSON to complex type column.
What I'm doing:
This PR adds the support of JSON format loading to STRUCT/MAP/ARRAY type column.
Here are some examples.
This feature also works for routine load.
Fixes #43101
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: