Skip to content

Commit 449d3cb

Browse files
committed
Revert "[SPARK-51276][PYTHON] Enable spark.sql.execution.arrow.pyspark.enabled by default"
This reverts commit 9ac566d.
1 parent 31cf3c3 commit 449d3cb

File tree

4 files changed

+3
-10
lines changed

4 files changed

+3
-10
lines changed

python/docs/source/migration_guide/pyspark_upgrade.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,6 @@ Upgrading from PySpark 3.5 to 4.0
7676
* In Spark 4.0, the data type ``YearMonthIntervalType`` in ``DataFrame.collect`` no longer returns the underlying integers. To restore the previous behavior, set ``PYSPARK_YM_INTERVAL_LEGACY`` environment variable to ``1``.
7777
* In Spark 4.0, items other than functions (e.g. ``DataFrame``, ``Column``, ``StructType``) have been removed from the wildcard import ``from pyspark.sql.functions import *``, you should import these items from proper modules (e.g. ``from pyspark.sql import DataFrame, Column``, ``from pyspark.sql.types import StructType``).
7878
* In Spark 4.0, ``spark.sql.execution.pythonUDF.arrow.enabled`` is enabled by default. If users have PyArrow and pandas installed in their local and Spark Cluster, it automatically optimizes the regular Python UDFs with Arrow. To turn off the Arrow optimization, set ``spark.sql.execution.pythonUDF.arrow.enabled`` to ``false``.
79-
* In Spark 4.0, ``spark.sql.execution.arrow.pyspark.enabled`` is enabled by default. If users have PyArrow and pandas installed in their local and Spark Cluster, it automatically makes use of Apache Arrow for columnar data transfers in PySpark. This optimization applies to ``pyspark.sql.DataFrame.toPandas`` and ``pyspark.sql.SparkSession.createDataFrame`` when its input is a Pandas DataFrame or a NumPy ndarray. To turn off the Arrow optimization, set ``spark.sql.execution.arrow.pyspark.enabled`` to ``false``.
8079

8180

8281
Upgrading from PySpark 3.3 to 3.4

python/pyspark/pandas/base.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1191,7 +1191,6 @@ def _shift(
11911191
return self._with_new_scol(col, field=self._internal.data_fields[0].copy(nullable=True))
11921192

11931193
# TODO: Update Documentation for Bins Parameter when its supported
1194-
# TODO(SPARK-51287): Enable s.index.value_counts() tests
11951194
def value_counts(
11961195
self,
11971196
normalize: bool = False,
@@ -1324,15 +1323,15 @@ def value_counts(
13241323
('falcon', 'length')],
13251324
)
13261325
1327-
>>> s.index.value_counts().sort_index() # doctest: +SKIP
1326+
>>> s.index.value_counts().sort_index()
13281327
(cow, length) 1
13291328
(cow, weight) 2
13301329
(falcon, length) 2
13311330
(falcon, weight) 1
13321331
(lama, weight) 3
13331332
Name: count, dtype: int64
13341333
1335-
>>> s.index.value_counts(normalize=True).sort_index() # doctest: +SKIP
1334+
>>> s.index.value_counts(normalize=True).sort_index()
13361335
(cow, length) 0.111111
13371336
(cow, weight) 0.222222
13381337
(falcon, length) 0.222222

python/pyspark/sql/tests/connect/test_connect_creation.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -219,11 +219,6 @@ def test_with_atom_type(self):
219219
self.assert_eq(sdf.toPandas(), cdf.toPandas())
220220

221221
def test_with_none_and_nan(self):
222-
# TODO(SPARK-51286): Fix test_with_none_and_nan to to pass with Arrow enabled
223-
with self.sql_conf({"spark.sql.execution.arrow.pyspark.enabled": False}):
224-
self.check_with_none_and_nan()
225-
226-
def check_with_none_and_nan(self):
227222
# SPARK-41855: make createDataFrame support None and NaN
228223
# SPARK-41814: test with eqNullSafe
229224
data1 = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, value=None)]

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3271,7 +3271,7 @@ object SQLConf {
32713271
.doc("(Deprecated since Spark 3.0, please set 'spark.sql.execution.arrow.pyspark.enabled'.)")
32723272
.version("2.3.0")
32733273
.booleanConf
3274-
.createWithDefault(true)
3274+
.createWithDefault(false)
32753275

32763276
val ARROW_PYSPARK_EXECUTION_ENABLED =
32773277
buildConf("spark.sql.execution.arrow.pyspark.enabled")

0 commit comments

Comments
 (0)