Commit 433ae90
[SPARK-33566][CORE][SQL][SS][PYTHON] Make unescapedQuoteHandling option configurable when read CSV
### What changes were proposed in this pull request?
There are some differences between Spark CSV, opencsv and commons-csv, the typical case are described in SPARK-33566, When there are both unescaped quotes and unescaped qualifier in value, the results of parsing are different.
The reason for the difference is Spark use `STOP_AT_DELIMITER` as default `UnescapedQuoteHandling` to build `CsvParser` and it not configurable.
On the other hand, opencsv and commons-csv use the parsing mechanism similar to `STOP_AT_CLOSING_QUOTE ` by default.
So this pr make `unescapedQuoteHandling` option configurable to get the same parsing result as opencsv and commons-csv.
### Why are the changes needed?
Make unescapedQuoteHandling option configurable when read CSV to make parsing more flexible。
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Pass the Jenkins or GitHub Action
- Add a new case similar to that described in SPARK-33566
Closes #30518 from LuciferYang/SPARK-33566.
Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>1 parent d082ad0 commit 433ae90
File tree
8 files changed
+122
-5
lines changed- python/pyspark/sql
- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv
- core/src
- main/scala/org/apache/spark/sql
- streaming
- test/scala/org/apache/spark/sql/execution/datasources/csv
8 files changed
+122
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
522 | 522 | | |
523 | 523 | | |
524 | 524 | | |
525 | | - | |
| 525 | + | |
| 526 | + | |
526 | 527 | | |
527 | 528 | | |
528 | 529 | | |
| |||
685 | 686 | | |
686 | 687 | | |
687 | 688 | | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
688 | 709 | | |
689 | 710 | | |
690 | 711 | | |
| |||
708 | 729 | | |
709 | 730 | | |
710 | 731 | | |
711 | | - | |
| 732 | + | |
| 733 | + | |
712 | 734 | | |
713 | 735 | | |
714 | 736 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| 116 | + | |
116 | 117 | | |
117 | 118 | | |
118 | 119 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
761 | 761 | | |
762 | 762 | | |
763 | 763 | | |
764 | | - | |
| 764 | + | |
765 | 765 | | |
766 | 766 | | |
767 | 767 | | |
| |||
900 | 900 | | |
901 | 901 | | |
902 | 902 | | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
903 | 923 | | |
904 | 924 | | |
905 | 925 | | |
| |||
926 | 946 | | |
927 | 947 | | |
928 | 948 | | |
929 | | - | |
| 949 | + | |
| 950 | + | |
930 | 951 | | |
931 | 952 | | |
932 | 953 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
149 | 149 | | |
150 | 150 | | |
151 | 151 | | |
| 152 | + | |
152 | 153 | | |
153 | 154 | | |
154 | 155 | | |
| |||
Lines changed: 7 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
213 | 213 | | |
214 | 214 | | |
215 | 215 | | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
216 | 222 | | |
217 | 223 | | |
218 | 224 | | |
| |||
258 | 264 | | |
259 | 265 | | |
260 | 266 | | |
261 | | - | |
| 267 | + | |
262 | 268 | | |
263 | 269 | | |
264 | 270 | | |
| |||
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
727 | 727 | | |
728 | 728 | | |
729 | 729 | | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
730 | 751 | | |
731 | 752 | | |
732 | 753 | | |
| |||
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
396 | 396 | | |
397 | 397 | | |
398 | 398 | | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
399 | 420 | | |
400 | 421 | | |
401 | 422 | | |
| |||
Lines changed: 24 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2428 | 2428 | | |
2429 | 2429 | | |
2430 | 2430 | | |
| 2431 | + | |
| 2432 | + | |
| 2433 | + | |
| 2434 | + | |
| 2435 | + | |
| 2436 | + | |
| 2437 | + | |
| 2438 | + | |
| 2439 | + | |
| 2440 | + | |
| 2441 | + | |
| 2442 | + | |
| 2443 | + | |
| 2444 | + | |
| 2445 | + | |
| 2446 | + | |
| 2447 | + | |
| 2448 | + | |
| 2449 | + | |
| 2450 | + | |
| 2451 | + | |
| 2452 | + | |
| 2453 | + | |
| 2454 | + | |
2431 | 2455 | | |
2432 | 2456 | | |
2433 | 2457 | | |
| |||
0 commit comments