Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Bump simdjson to 3.9.4 and Fix struct field columns inconsistent when loading from bad json #47775

Merged
merged 3 commits into from
Jul 5, 2024

Conversation

wyb
Copy link
Contributor

@wyb wyb commented Jul 3, 2024

Why I'm doing:

  1. struct field columns may be inconsistent when parsing partial field failed.
  2. find_field_unordered will crash in current simdjson version when loading from bad json.
#0  std::__uniq_ptr_impl<simdjson::internal::dom_parser_implementation, std::default_delete<simdjson::internal::dom_parser_implementation> >::_M_ptr (this=0x8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:173
#1  std::unique_ptr<simdjson::internal::dom_parser_implementation, std::default_delete<simdjson::internal::dom_parser_implementation> >::get (this=0x8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:422
#2  std::unique_ptr<simdjson::internal::dom_parser_implementation, std::default_delete<simdjson::internal::dom_parser_implementation> >::operator-> (this=0x8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:416
#3  simdjson::fallback::ondemand::json_iterator::end_position (this=0x7fb36e049540) at /home/disk3/sr-deps/thirdparty/installed/include/simdjson/generic/ondemand/json_iterator-inl.h:193
#4  simdjson::fallback::ondemand::json_iterator::skip_child (this=0x7fb36e049540, parent_depth=<optimized out>) at /home/disk3/sr-deps/thirdparty/installed/include/simdjson/generic/ondemand/json_iterator-inl.h:126
#5  simdjson::fallback::ondemand::value_iterator::skip_child (this=<optimized out>) at /home/disk3/sr-deps/thirdparty/installed/include/simdjson/generic/ondemand/value_iterator-inl.h:693
#6  simdjson::fallback::ondemand::value_iterator::find_field_unordered_raw (key=<error reading variable: Cannot create a lazy string with address 0x0, and a non-zero length.>, this=<optimized out>) at /home/disk3/sr-deps/thirdparty/installed/include/simdjson/generic/ondemand/value_iterator-inl.h:306
#7  simdjson::fallback::ondemand::object::find_field_unordered(std::basic_string_view<char, std::char_traits<char> >) & (key=<error reading variable: Cannot create a lazy string with address 0x0, and a non-zero length.>, this=<optimized out>) at /home/disk3/sr-deps/thirdparty/installed/include/simdjson/generic/ondemand/object-inl.h:7
#8  starrocks::add_struct_column (column=0x7fb376e46330, type_desc=..., name="k2", value=value@entry=0x7fb34e6f7090) at be/src/formats/json/struct_column.cpp:42
#9  0x00000000075bdb0f in starrocks::add_adaptive_nullable_struct_column (column=0x7fb376e46380, type_desc=..., name="k2", value=0x7fb34e6f7090) at be/src/formats/json/nullable_column.cpp:255
#10 starrocks::add_adpative_nullable_column (column=0x7fb376e46380, type_desc=..., name="k2", value=...) at be/src/formats/json/nullable_column.cpp:404
#11 starrocks::add_adaptive_nullable_column (column=0x7fb376e46380, type_desc=..., name="k2", value=value@entry=0x7fb34e6f7090, invalid_as_null=true) at be/src/formats/json/nullable_column.cpp:456
#12 0x00000000075812fa in starrocks::JsonReader::_construct_column (this=0x7fb376e5d000, value=..., column=0x7fb376e31b30, column@entry=0x7fb376e5d000, type_desc=..., col_name=<error reading variable: Cannot access memory at address 0x8>) at be/src/exec/json_scanner.cpp:812
#13 starrocks::JsonReader::_construct_row_without_jsonpath (this=this@entry=0x7fb376e5d000, row=row@entry=0x7fb34e6f71c0, chunk=chunk@entry=0x7fb376e4d010) at be/src/exec/json_scanner.cpp:563
#14 0x00000000075865d3 in starrocks::JsonReader::_construct_row (this=0x7fb376e5d000, row=0x7fb34e6f71c0, chunk=0x7fb376e4d010) at be/src/exec/json_scanner.cpp:656
#15 starrocks::JsonReader::_read_rows<starrocks::JsonDocumentStreamParser> (this=this@entry=0x7fb376e5d000, chunk=chunk@entry=0x7fb376e4d010, rows_to_read=rows_to_read@entry=4096, rows_read=rows_read@entry=0x7fb34e6f72c4) at be/src/exec/json_scanner.cpp:459
#16 0x000000000757f062 in starrocks::JsonReader::read_chunk (this=0x7fb376e5d000, chunk=0x7fb376e4d010, rows_to_read=4096) at be/src/exec/json_scanner.cpp:426

What I'm doing:

  1. bump simdjson to 3.9.4 to fix find_field_unordered crash.
  2. fill null if error to avoid inconsistent struct field columns.
  3. support big integer(<-9223372036854775808 and >18446744073709551615).

Fixes #issue
https://github.com/StarRocks/StarRocksTest/issues/7982

#45406

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@wyb wyb requested a review from a team as a code owner July 3, 2024 04:17
@mergify mergify bot assigned wyb Jul 3, 2024
@wyb wyb changed the title [BugFix] Fix struct field column inconsistent when loading from bad json [BugFix] Fix struct field columns inconsistent when loading from bad json Jul 3, 2024
meegoo
meegoo previously approved these changes Jul 3, 2024
@meegoo meegoo enabled auto-merge (squash) July 3, 2024 06:48
@wyb wyb marked this pull request as draft July 3, 2024 11:26
auto-merge was automatically disabled July 3, 2024 11:26

Pull request was converted to draft

@wyb wyb changed the title [BugFix] Fix struct field columns inconsistent when loading from bad json [BugFix] Bump simdjson to 3.9.4 and Fix struct field columns inconsistent when loading from bad json Jul 4, 2024
@wyb wyb marked this pull request as ready for review July 4, 2024 11:13
@wyb wyb requested a review from a team as a code owner July 4, 2024 11:13
@wyb wyb force-pushed the struct_bad_json branch from 5e8db35 to cf93183 Compare July 4, 2024 11:21
Signed-off-by: wyb <wybb86@gmail.com>
@wyb wyb force-pushed the struct_bad_json branch from cf93183 to 0dca7f1 Compare July 4, 2024 11:26
meegoo
meegoo previously approved these changes Jul 4, 2024
Signed-off-by: wyb <wybb86@gmail.com>
@wyb wyb enabled auto-merge (squash) July 5, 2024 01:53
Copy link

github-actions bot commented Jul 5, 2024

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Jul 5, 2024

[BE Incremental Coverage Report]

pass : 11 / 13 (84.62%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/formats/json/numeric_column.cpp 9 11 81.82% [95, 96]
🔵 be/src/formats/json/struct_column.cpp 2 2 100.00% []

@wyb wyb merged commit 3ca24dc into StarRocks:main Jul 5, 2024
47 checks passed
Copy link

github-actions bot commented Jul 5, 2024

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label Jul 5, 2024
Copy link

github-actions bot commented Jul 5, 2024

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Jul 5, 2024
Copy link
Contributor

mergify bot commented Jul 5, 2024

backport branch-3.3

✅ Backports have been created

Copy link
Contributor

mergify bot commented Jul 5, 2024

backport branch-3.2

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jul 5, 2024
…tent when loading from bad json (#47775)

Signed-off-by: wyb <wybb86@gmail.com>
(cherry picked from commit 3ca24dc)
mergify bot pushed a commit that referenced this pull request Jul 5, 2024
…tent when loading from bad json (#47775)

Signed-off-by: wyb <wybb86@gmail.com>
(cherry picked from commit 3ca24dc)
wanpengfei-git pushed a commit that referenced this pull request Jul 5, 2024
…tent when loading from bad json (backport #47775) (#47894)

Co-authored-by: wyb <wybb86@gmail.com>
wanpengfei-git pushed a commit that referenced this pull request Jul 5, 2024
…tent when loading from bad json (backport #47775) (#47895)

Co-authored-by: wyb <wybb86@gmail.com>
ZiheLiu added a commit to ZiheLiu/starrocks that referenced this pull request Jul 8, 2024
…inconsistent when loading from bad json (StarRocks#47775)"

This reverts commit 3ca24dc.
ZiheLiu added a commit to ZiheLiu/starrocks that referenced this pull request Jul 18, 2024
…inconsistent when loading from bad json (StarRocks#47775)"

This reverts commit 3ca24dc.
ZiheLiu added a commit to ZiheLiu/starrocks that referenced this pull request Jul 31, 2024
…inconsistent when loading from bad json (StarRocks#47775)"

This reverts commit 3ca24dc.
ZiheLiu added a commit to ZiheLiu/starrocks that referenced this pull request Jul 31, 2024
…inconsistent when loading from bad json (StarRocks#47775)"

This reverts commit 3ca24dc.
ZiheLiu added a commit to ZiheLiu/starrocks that referenced this pull request Aug 8, 2024
…inconsistent when loading from bad json (StarRocks#47775)"

This reverts commit 3ca24dc.
ZiheLiu added a commit to ZiheLiu/starrocks that referenced this pull request Aug 9, 2024
…inconsistent when loading from bad json (StarRocks#47775)"

This reverts commit 3ca24dc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants