Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-37291][PYTHON][SQL] PySpark init SparkSession should copy conf…
… to sharedState ### What changes were proposed in this pull request? When use write pyspark script like ``` conf = SparkConf().setAppName("test") sc = SparkContext(conf = conf) session = SparkSession().build().enableHiveSupport().getOrCreate() ``` It will build a session without hive support since we use a existed SparkContext and we create SparkSession use ``` SparkSession(sc) ``` This cause we loss configuration added by `config()` such as catalog implement. In scala class `SparkSession`, we create `SparkSession` with `SparkContext` and option configurations and will pass option configurations to `SharedState` then use `SharedState`'s conf create SessionState, but in pyspark, we won't pass options configuration to `SharedState`, but pass to `SessionState`, but this time `SessionState` has been initialized. So it won't support hive. In this pr, I pass option configurations to `SharedState` when first init `SparkSession`, then when init `SessionState`, this options will be passed to `SessionState` too. ### Why are the changes needed? Avoid loss configuration when build SparkSession in pyspark ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manuel tested & added UT Closes #34559 from AngersZhuuuu/SPARK-37291. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
- Loading branch information