Commit 624eda5
[SPARK-49444][SQL] Modified UnivocityParser to throw runtime exceptions caused by ArrayIndexOutOfBounds with more user-oriented messages
### What changes were proposed in this pull request?
I propose to catch and rethrow runtime `ArrayIndexOutOfBounds` exceptions in the `UnivocityParser` class - `parse` method, but with more user-oriented messages. Instead of throwing exceptions in the original format, I propose to inform the users which csv record caused the error.
### Why are the changes needed?
Proper informing of users' errors improves user experience. Instead of throwing `ArrayIndexOutOfBounds` exception without clear reason why it happened, proposed changes throw `SparkRuntimeException` with the message that includes original csv line which caused the error.
### Does this PR introduce _any_ user-facing change?
This PR introduces a user-facing change which happens when `UnivocityParser` parses malformed csv line with from the input. More specifically, the change is reproduces in the test case within `UnivocityParserSuite` when user specifies `maxColumns` in parser options and parsed csv record has more columns. Instead of resulting in `ArrayIndexOutOfBounds` like mentioned in the HMR ticket, users now get `SparkRuntimeException` with message that contains the input line which caused the error.
### How was this patch tested?
This patch was tested in `UnivocityParserSuite`. Test named "Array index out of bounds when parsing CSV with more columns than expected" covers this patch. Additionally, test for bad records in `UnivocityParser`'s `PERMISSIVE` mode is added to confirm that `BadRecordException` is being thrown properly.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #47906 from vladanvasi-db/vladanvasi-db/univocity-parser-index-out-of-bounds-handling.
Authored-by: Vladan Vasić <vladan.vasic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>1 parent 87b5ffb commit 624eda5
File tree
5 files changed
+92
-6
lines changed- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst/csv
- test/scala/org/apache/spark/sql/catalyst/csv
- core/src/test
- resources/test-data
- scala/org/apache/spark/sql
- execution/datasources/csv
5 files changed
+92
-6
lines changedLines changed: 17 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | | - | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
294 | 295 | | |
295 | 296 | | |
296 | 297 | | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
297 | 312 | | |
298 | 313 | | |
299 | 314 | | |
| |||
306 | 321 | | |
307 | 322 | | |
308 | 323 | | |
309 | | - | |
| 324 | + | |
310 | 325 | | |
311 | 326 | | |
312 | 327 | | |
| |||
Lines changed: 37 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
29 | 30 | | |
30 | 31 | | |
31 | | - | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
323 | 323 | | |
324 | 324 | | |
325 | 325 | | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
326 | 361 | | |
327 | 362 | | |
328 | 363 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
234 | 235 | | |
235 | 236 | | |
236 | 237 | | |
237 | | - | |
| 238 | + | |
238 | 239 | | |
239 | 240 | | |
240 | 241 | | |
| |||
Lines changed: 34 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| 88 | + | |
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
| |||
3439 | 3440 | | |
3440 | 3441 | | |
3441 | 3442 | | |
| 3443 | + | |
| 3444 | + | |
| 3445 | + | |
| 3446 | + | |
| 3447 | + | |
| 3448 | + | |
| 3449 | + | |
| 3450 | + | |
| 3451 | + | |
| 3452 | + | |
| 3453 | + | |
| 3454 | + | |
| 3455 | + | |
| 3456 | + | |
| 3457 | + | |
| 3458 | + | |
| 3459 | + | |
| 3460 | + | |
| 3461 | + | |
| 3462 | + | |
| 3463 | + | |
| 3464 | + | |
| 3465 | + | |
| 3466 | + | |
| 3467 | + | |
| 3468 | + | |
| 3469 | + | |
| 3470 | + | |
| 3471 | + | |
| 3472 | + | |
| 3473 | + | |
| 3474 | + | |
| 3475 | + | |
3442 | 3476 | | |
3443 | 3477 | | |
3444 | 3478 | | |
| |||
0 commit comments