Skip to content

Commit e9d1f24

Browse files
committed
improve migration guide
1 parent d248d4c commit e9d1f24

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/sql-programming-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1811,7 +1811,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see
18111811
- Since Spark 2.4, Spark compares a DATE type with a TIMESTAMP type after promotes both sides to TIMESTAMP. To set `false` to `spark.sql.hive.compareDateTimestampInTimestamp` restores the previous behavior. This option will be removed in Spark 3.0.
18121812
- Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set `true` to `spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the previous behavior. This option will be removed in Spark 3.0.
18131813
- Since Spark 2.4, the type coercion rules can automatically promote the argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest common type, no matter how the input arguments order. In prior Spark versions, the promotion could fail in some specific orders (e.g., TimestampType, IntegerType and StringType) and throw an exception.
1814-
- Since Spark 2.4, `to_utc_timestamp` and `from_utc_timestamp` return null if the input timestamp string has a timezone part, e.g. `2000-10-10 00:00:00+00:00`. To set `false` to `spark.sql.function.rejectTimezoneInString` restores the previous behavior. This option will be removed in Spark 3.0.
1814+
- In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` respect the timezone in the input timestamp string, which breaks the assumption that the input timestamp is in a specific timezone, and returns weird result. In version 2.4 and later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if the input timestamp string contains timezone. As an example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` should return `2000-10-10 01:00:00`. If the input timestamp string contains timezone, e.g. `from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`. It returns `2000-10-10 09:00:00` in Spark 2.3(local timezone is GMT+8), and returns null in Spark 2.4. For people who don't care about this problem and want to retain the previous behaivor to keep their query unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. This option will be removed in Spark 3.0 and should only be used as a temporary workaround.
18151815
## Upgrading From Spark SQL 2.2 to 2.3
18161816

18171817
- Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column (named `_corrupt_record` by default). For example, `spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()` and `spark.read.schema(schema).json(file).select("_corrupt_record").show()`. Instead, you can cache or save the parsed results and then send the same query. For example, `val df = spark.read.schema(schema).json(file).cache()` and then `df.filter($"_corrupt_record".isNotNull).count()`.

0 commit comments

Comments
 (0)