Commit 1e6c1d8
[SPARK-25493][SQL] Use auto-detection for CRLF in CSV datasource multiline mode
## What changes were proposed in this pull request?
CSVs with windows style crlf ('\r\n') don't work in multiline mode. They work fine in single line mode because the line separation is done by Hadoop, which can handle all the different types of line separators. This PR fixes it by enabling Univocity's line separator detection in multiline mode, which will detect '\r\n', '\r', or '\n' automatically as it is done by hadoop in single line mode.
## How was this patch tested?
Unit test with a file with crlf line endings.
Closes #22503 from justinuang/fix-clrf-multiline.
Authored-by: Justin Uang <juang@palantir.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>1 parent d0ecff2 commit 1e6c1d8
File tree
3 files changed
+21
-0
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv
- core/src/test
- resources/test-data
- scala/org/apache/spark/sql/execution/datasources/csv
3 files changed
+21
-0
lines changedLines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
| 215 | + | |
| 216 | + | |
215 | 217 | | |
216 | 218 | | |
217 | 219 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
55 | 56 | | |
56 | 57 | | |
57 | 58 | | |
| |||
220 | 221 | | |
221 | 222 | | |
222 | 223 | | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
223 | 235 | | |
224 | 236 | | |
225 | 237 | | |
| |||
0 commit comments