Commit 795f381
[SPARK-40646][SQL] Fix returning partial results in JSON data source and JSON functions
### What changes were proposed in this pull request?
This PR is a follow-up for [SPARK-33134](https://issues.apache.org/jira/browse/SPARK-33134) (apache#30031).
I found another case when, depending on the order of columns, parsing one JSON field breaks all of the subsequent fields resulting in all nulls:
With a file like this:
```
{"a": {"x": 1, "y": true}, "b": {"x": 1}}
{"a": {"x": 2}, "b": {"x": 2}}
```
Reading the file results in column `b` as null even though it is a valid column.
```scala
val df = spark.read
.schema("a struct<x: int, y: struct<x: int>>, b struct<x: int>")
.json("path")
===
a b
null null
{"x":2,"y":null} {"x":2}
```
However, b column should be:
```
{"x": 1}
{"x": 2}
```
This particular example actually used to work in earlier Spark versions but it was affected by SPARK-33134 which fixed another bug with the incorrect parsing in `from_json`. Because this case was not tested, we missed it at the time.
In order to fix both SPARK-33134 and SPARK-40646, we need to process `PartialResultException` in `convertArray` method to handle any errors in child objects. Without the fix, the code would not wrap the row in the array for `from_json` resulting in a ClassCastException (SPARK-33134). Because of this handling, we don't need `isRoot` check anymore in `convertObject` thus unblocking SPARK-40646.
I updated the code to handle both cases. With these changes, we can correctly parse this case:
```scala
val df3 = Seq("""[{"c2": [19], "c1": 123456}]""").toDF("c0")
checkAnswer(df3.select(from_json($"c0", ArrayType(st))), Row(Array(Row(123456, null))))
```
which was previously returning `null` for the root row.
### Why are the changes needed?
Fixes a long-standing issue when parsing a JSON with an incorrect field that would break parsing of the entire record.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I added unit tests for SPARK-40646 as well as SPARK-33134.
Closes apache#38090 from sadikovi/SPARK-40646.
Authored-by: Ivan Sadikov <ivan.sadikov@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>1 parent 0a84082 commit 795f381
File tree
3 files changed
+130
-8
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/json
- core/src/test/scala/org/apache/spark/sql
- execution/datasources/json
3 files changed
+130
-8
lines changedLines changed: 38 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
456 | 456 | | |
457 | 457 | | |
458 | 458 | | |
459 | | - | |
| 459 | + | |
460 | 460 | | |
461 | 461 | | |
462 | 462 | | |
| |||
482 | 482 | | |
483 | 483 | | |
484 | 484 | | |
| 485 | + | |
| 486 | + | |
485 | 487 | | |
486 | 488 | | |
487 | | - | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
488 | 499 | | |
489 | 500 | | |
490 | 501 | | |
491 | 502 | | |
492 | | - | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
493 | 510 | | |
494 | 511 | | |
495 | 512 | | |
| |||
500 | 517 | | |
501 | 518 | | |
502 | 519 | | |
| 520 | + | |
| 521 | + | |
503 | 522 | | |
504 | | - | |
505 | | - | |
506 | | - | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
507 | 532 | | |
508 | 533 | | |
509 | | - | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
510 | 541 | | |
511 | 542 | | |
512 | 543 | | |
| |||
Lines changed: 71 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
853 | 853 | | |
854 | 854 | | |
855 | 855 | | |
856 | | - | |
| 856 | + | |
857 | 857 | | |
858 | 858 | | |
859 | 859 | | |
860 | 860 | | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
861 | 931 | | |
862 | 932 | | |
863 | 933 | | |
| |||
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3381 | 3381 | | |
3382 | 3382 | | |
3383 | 3383 | | |
| 3384 | + | |
| 3385 | + | |
| 3386 | + | |
| 3387 | + | |
| 3388 | + | |
| 3389 | + | |
| 3390 | + | |
| 3391 | + | |
| 3392 | + | |
| 3393 | + | |
| 3394 | + | |
| 3395 | + | |
| 3396 | + | |
| 3397 | + | |
| 3398 | + | |
| 3399 | + | |
| 3400 | + | |
| 3401 | + | |
| 3402 | + | |
| 3403 | + | |
| 3404 | + | |
3384 | 3405 | | |
3385 | 3406 | | |
3386 | 3407 | | |
| |||
0 commit comments