Skip to content

Commit dd518a1

Browse files
MaxGekkcloud-fan
authored andcommitted
[SPARK-26151][SQL][FOLLOWUP] Return partial results for bad CSV records
## What changes were proposed in this pull request? Updated SQL migration guide according to changes in #23120 Closes #23235 from MaxGekk/failuresafe-partial-result-followup. Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 7bb1dab commit dd518a1

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

docs/sql-migration-guide-upgrade.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@ displayTitle: Spark SQL Upgrading Guide
3535

3636
- Since Spark 3.0, CSV datasource uses java.time API for parsing and generating CSV content. New formatting implementation supports date/timestamp patterns conformed to ISO 8601. To switch back to the implementation used in Spark 2.4 and earlier, set `spark.sql.legacy.timeParser.enabled` to `true`.
3737

38+
- In Spark version 2.4 and earlier, CSV datasource converts a malformed CSV string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, returned row can contain non-`null` fields if some of CSV column values were parsed and converted to desired types successfully.
39+
3840
## Upgrading From Spark SQL 2.3 to 2.4
3941

4042
- In Spark version 2.3 and earlier, the second parameter to array_contains function is implicitly promoted to the element type of first array type parameter. This type promotion can be lossy and may cause `array_contains` function to return wrong result. This problem has been addressed in 2.4 by employing a safer type promotion mechanism. This can cause some change in behavior and are illustrated in the table below.

0 commit comments

Comments
 (0)