Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON reader validation of values #15968

Merged
Merged
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
bb991ef
validation of tokens code
karthikeyann Jun 11, 2024
4e707cb
fix pre-commit check failures
karthikeyann Jun 18, 2024
35a8268
Merge branch 'branch-24.08' into fea-json_spark_validation
karthikeyann Jun 18, 2024
cd6a30f
Merge branch 'branch-24.08' into fea-json_spark_validation
karthikeyann Jun 27, 2024
0c2e4da
Add Spark Compatible JSON validation (#10)
revans2 Aug 2, 2024
6a38578
Merge branch 'branch-24.08' of github.com:rapidsai/cudf into fea-json…
karthikeyann Aug 2, 2024
0d6cb12
Merge branch 'branch-24.10' of github.com:rapidsai/cudf into fea-json…
karthikeyann Aug 2, 2024
dfa6b18
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann Aug 9, 2024
e944937
style fixes
karthikeyann Aug 9, 2024
23072c0
Update json normalization to take device_buffer
karthikeyann Aug 9, 2024
a885340
fix char comparison error
karthikeyann Aug 9, 2024
3867c61
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann Aug 9, 2024
ab1385d
update char comparison
karthikeyann Aug 15, 2024
80c7c3a
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann Aug 26, 2024
f2e2b44
rename to tabulate_output_iterator.cuh
karthikeyann Aug 26, 2024
0963218
absorb counting_iterator to tabulate_output_iterator
karthikeyann Aug 26, 2024
be7402c
update documentation
karthikeyann Aug 26, 2024
b114401
add na_values to validation
karthikeyann Aug 26, 2024
a1e9afc
add strict validation to test
karthikeyann Aug 26, 2024
ec78ef9
rename tabulate_output_iterator namespace
karthikeyann Aug 26, 2024
a225ce0
remove comments and notes
karthikeyann Aug 26, 2024
7a2a451
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann Aug 26, 2024
875a72b
fix unsigned/signed issue with ARM systems
karthikeyann Sep 3, 2024
ef6f298
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann Sep 3, 2024
be7f17e
remove comments
karthikeyann Sep 3, 2024
fb62877
fix condition
karthikeyann Sep 4, 2024
e4f7d04
fix char issue with typecast
karthikeyann Sep 5, 2024
851fe3e
Update cpp/include/cudf/io/json.hpp
karthikeyann Sep 5, 2024
35e4b89
Update cpp/include/cudf/io/json.hpp
karthikeyann Sep 5, 2024
3681823
address review comments
karthikeyann Sep 5, 2024
1d897f7
fix doc
karthikeyann Sep 5, 2024
e1435ce
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann Sep 5, 2024
6bf4d3f
address review comments
karthikeyann Sep 5, 2024
e9ebb91
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann Sep 6, 2024
e093d64
address review comments
karthikeyann Sep 9, 2024
00ef690
rename lambda name
karthikeyann Sep 9, 2024
86bbeab
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann Sep 9, 2024
cecb42f
Apply suggestions from code review
karthikeyann Sep 10, 2024
9cd3098
Apply suggestions from code review
karthikeyann Sep 10, 2024
c816c73
update docs
karthikeyann Sep 10, 2024
53db703
Update cpp/include/cudf/io/json.hpp
ttnghia Sep 10, 2024
c3832b6
Update cpp/include/cudf/io/json.hpp
ttnghia Sep 10, 2024
070263e
Update cpp/include/cudf/io/json.hpp
ttnghia Sep 10, 2024
fb0e85f
fix strict_validation dependent options with if
karthikeyann Sep 10, 2024
e7fce07
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann Sep 10, 2024
252c38b
fix typo
karthikeyann Sep 10, 2024
5ab337b
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann Sep 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix strict_validation dependent options with if
  • Loading branch information
karthikeyann committed Sep 10, 2024
commit fb0e85f53dd569dde12e6321ea5f5655cdaa100d
33 changes: 20 additions & 13 deletions java/src/main/native/src/TableJni.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1646,11 +1646,12 @@ Java_ai_rapids_cudf_Table_readAndInferJSONFromDataSource(JNIEnv* env,
.normalize_whitespace(static_cast<bool>(normalize_whitespace))
.mixed_types_as_string(mixed_types_as_string)
.strict_validation(strict_validation)
.numeric_leading_zeros(allow_leading_zeros)
.nonnumeric_numbers(allow_nonnumeric_numbers)
.unquoted_control_chars(allow_unquoted_control)
.keep_quotes(keep_quotes);

if (strict_validation) {
opt.numeric_leading_zeros(allow_leading_zeros)
karthikeyann marked this conversation as resolved.
Show resolved Hide resolved
.nonnumeric_numbers(allow_nonnumeric_numbers)
.unquoted_control_chars(allow_unquoted_control);
}
auto result =
std::make_unique<cudf::io::table_with_metadata>(cudf::io::read_json(opts.build()));

Expand Down Expand Up @@ -1697,11 +1698,13 @@ Java_ai_rapids_cudf_Table_readAndInferJSON(JNIEnv* env,
.normalize_single_quotes(static_cast<bool>(normalize_single_quotes))
.normalize_whitespace(static_cast<bool>(normalize_whitespace))
.strict_validation(strict_validation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has to change now for the tests to pass. We have to not try and set

        .numeric_leading_zeros(allow_leading_zeros)
        .nonnumeric_numbers(allow_nonnumeric_numbers)
        .unquoted_control_chars(allow_unquoted_control)

at all if strict_validation is disabled. And this goes for all of the APIs in this file because of the assertion that was just added in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

.numeric_leading_zeros(allow_leading_zeros)
.nonnumeric_numbers(allow_nonnumeric_numbers)
.unquoted_control_chars(allow_unquoted_control)
.mixed_types_as_string(mixed_types_as_string)
.keep_quotes(keep_quotes);
if (strict_validation) {
opt.numeric_leading_zeros(allow_leading_zeros)
karthikeyann marked this conversation as resolved.
Show resolved Hide resolved
.nonnumeric_numbers(allow_nonnumeric_numbers)
.unquoted_control_chars(allow_unquoted_control);
}

auto result =
std::make_unique<cudf::io::table_with_metadata>(cudf::io::read_json(opts.build()));
Expand Down Expand Up @@ -1845,10 +1848,12 @@ Java_ai_rapids_cudf_Table_readJSONFromDataSource(JNIEnv* env,
.normalize_whitespace(static_cast<bool>(normalize_whitespace))
.mixed_types_as_string(mixed_types_as_string)
.strict_validation(strict_validation)
.numeric_leading_zeros(allow_leading_zeros)
.nonnumeric_numbers(allow_nonnumeric_numbers)
.unquoted_control_chars(allow_unquoted_control)
.keep_quotes(keep_quotes);
if (strict_validation) {
opt.numeric_leading_zeros(allow_leading_zeros)
karthikeyann marked this conversation as resolved.
Show resolved Hide resolved
.nonnumeric_numbers(allow_nonnumeric_numbers)
.unquoted_control_chars(allow_unquoted_control);
}

if (!n_types.is_null()) {
if (n_types.size() != n_scales.size()) {
Expand Down Expand Up @@ -1952,10 +1957,12 @@ JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_Table_readJSON(JNIEnv* env,
.normalize_whitespace(static_cast<bool>(normalize_whitespace))
.mixed_types_as_string(mixed_types_as_string)
.strict_validation(strict_validation)
.numeric_leading_zeros(allow_leading_zeros)
.nonnumeric_numbers(allow_nonnumeric_numbers)
.unquoted_control_chars(allow_unquoted_control)
.keep_quotes(keep_quotes);
if (strict_validation) {
opt.numeric_leading_zeros(allow_leading_zeros)
karthikeyann marked this conversation as resolved.
Show resolved Hide resolved
.nonnumeric_numbers(allow_nonnumeric_numbers)
.unquoted_control_chars(allow_unquoted_control);
}

if (!n_types.is_null()) {
if (n_types.size() != n_scales.size()) {
Expand Down
Loading