Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace normalization of nested column coerced as string column in JSONL inputs #16759

Merged
Merged
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
8e87b60
post tree whitespace normalization
shrshi Sep 5, 2024
30603f1
formatting
shrshi Sep 5, 2024
e45587d
Merge branch 'branch-24.10' into json-whitespace-normalization-post
shrshi Sep 5, 2024
a6ca1b8
removed unnecessary copy
shrshi Sep 5, 2024
88f06e4
formatting
shrshi Sep 5, 2024
b04c6a8
Merge branch 'json-whitespace-normalization-post' of github.com:shrsh…
shrshi Sep 5, 2024
d4189c5
addressed reviews - 1
shrshi Sep 6, 2024
cd8a840
added more null rows to the test example
shrshi Sep 6, 2024
81beb04
forced column as string impl
shrshi Sep 10, 2024
49b5f26
formatting
shrshi Sep 10, 2024
8f43a05
Merge branch 'branch-24.10' into json-whitespace-normalization-post
shrshi Sep 10, 2024
274f48f
replace mixed type as string with prune column
shrshi Sep 10, 2024
db9c783
formatting
shrshi Sep 10, 2024
70c6a70
Merge branch 'json-whitespace-normalization-post' of github.com:shrsh…
shrshi Sep 10, 2024
d801111
addressing pr reviews - part 1
shrshi Sep 10, 2024
e63faaa
formatting
shrshi Sep 10, 2024
110856d
addressing pr reviews - part 2
shrshi Sep 10, 2024
8728964
formatting
shrshi Sep 10, 2024
c1842e2
merge
shrshi Sep 10, 2024
a6d1646
added check for whitespace normalization
shrshi Sep 10, 2024
87fca8e
removing all old code
shrshi Sep 11, 2024
e76d74e
formatting
shrshi Sep 11, 2024
9921a26
Merge branch 'branch-24.10' into json-whitespace-normalization-post
shrshi Sep 11, 2024
55dbe92
addressing PR reviews
shrshi Sep 17, 2024
d4a2135
formatting
shrshi Sep 17, 2024
bdf3c19
Merge branch 'json-whitespace-normalization-post' of github.com:shrsh…
shrshi Sep 17, 2024
efd75b3
addressing PR reviews
shrshi Sep 17, 2024
85f5427
formatting
shrshi Sep 17, 2024
993eb15
more pr reviews
shrshi Sep 17, 2024
00a650e
formatting
shrshi Sep 17, 2024
39bedb8
merge
shrshi Sep 17, 2024
0c18a3a
formatting
shrshi Sep 17, 2024
7412bbd
simplifying namespace and variable names
shrshi Sep 18, 2024
b359e80
changing stencil to bool
shrshi Sep 18, 2024
d12ff96
Merge branch 'branch-24.10' into json-whitespace-normalization-post
karthikeyann Sep 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
formatting
  • Loading branch information
shrshi committed Sep 17, 2024
commit d4a2135f13620bd6fc8f25c663f3cea73523367e
5 changes: 3 additions & 2 deletions cpp/src/io/json/json_column.cu
Original file line number Diff line number Diff line change
Expand Up @@ -1167,8 +1167,9 @@ table_with_metadata device_parse_nested_json(device_span<SymbolT const> d_input,
const auto [tokens_gpu, token_indices_gpu] =
get_token_stream(d_input, options, stream, cudf::get_current_device_resource_ref());
// gpu tree generation
// Note that to normalize whitespaces in nested columns coerced to be string, we need the column to either be of
// mixed type or we need to request the column to be returned as string by pruning it with the STRING dtype
// Note that to normalize whitespaces in nested columns coerced to be string, we need the column
// to either be of mixed type or we need to request the column to be returned as string by
// pruning it with the STRING dtype
return get_tree_representation(
tokens_gpu,
token_indices_gpu,
Expand Down