From ab7aea144da40f7b2ef57ff1d289a9b3bea2d19f Mon Sep 17 00:00:00 2001 From: Ruifeng Zheng Date: Tue, 10 Sep 2024 16:37:10 +0800 Subject: [PATCH] [MINOR][DOCS] Fix scaladoc for `FlatMapGroupsInArrowExec` and `FlatMapCoGroupsInArrowExec` ### What changes were proposed in this pull request? Fix scaladoc for `FlatMapGroupsInArrowExec` and `FlatMapCoGroupsInArrowExec` ### Why are the changes needed? existing scaladoc were actually copy-pasted from pandas ones ### Does this PR introduce _any_ user-facing change? doc change ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? no Closes #48052 from zhengruifeng/py_type_applyinxxx. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- .../sql/execution/python/FlatMapCoGroupsInArrowExec.scala | 8 ++++---- .../sql/execution/python/FlatMapGroupsInArrowExec.scala | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapCoGroupsInArrowExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapCoGroupsInArrowExec.scala index e91140414732b..a2d200dc86e18 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapCoGroupsInArrowExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapCoGroupsInArrowExec.scala @@ -23,13 +23,13 @@ import org.apache.spark.sql.execution.SparkPlan /** - * Physical node for [[org.apache.spark.sql.catalyst.plans.logical.FlatMapCoGroupsInPandas]] + * Physical node for [[org.apache.spark.sql.catalyst.plans.logical.FlatMapCoGroupsInArrow]] * * The input dataframes are first Cogrouped. Rows from each side of the cogroup are passed to the * Python worker via Arrow. As each side of the cogroup may have a different schema we send every * group in its own Arrow stream. - * The Python worker turns the resulting record batches to `pandas.DataFrame`s, invokes the - * user-defined function, and passes the resulting `pandas.DataFrame` + * The Python worker turns the resulting record batches to `pyarrow.Table`s, invokes the + * user-defined function, and passes the resulting `pyarrow.Table` * as an Arrow record batch. Finally, each record batch is turned to * Iterator[InternalRow] using ColumnarBatch. * @@ -37,7 +37,7 @@ import org.apache.spark.sql.execution.SparkPlan * Both the Python worker and the Java executor need to have enough memory to * hold the largest cogroup. The memory on the Java side is used to construct the * record batches (off heap memory). The memory on the Python side is used for - * holding the `pandas.DataFrame`. It's possible to further split one group into + * holding the `pyarrow.Table`. It's possible to further split one group into * multiple record batches to reduce the memory footprint on the Java side, this * is left as future work. */ diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInArrowExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInArrowExec.scala index 942aaf6e44c17..6569b29f3954f 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInArrowExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInArrowExec.scala @@ -25,11 +25,11 @@ import org.apache.spark.sql.types.{StructField, StructType} /** - * Physical node for [[org.apache.spark.sql.catalyst.plans.logical.FlatMapGroupsInPandas]] + * Physical node for [[org.apache.spark.sql.catalyst.plans.logical.FlatMapGroupsInArrow]] * * Rows in each group are passed to the Python worker as an Arrow record batch. - * The Python worker turns the record batch to a `pandas.DataFrame`, invoke the - * user-defined function, and passes the resulting `pandas.DataFrame` + * The Python worker turns the record batch to a `pyarrow.Table`, invokes the + * user-defined function, and passes the resulting `pyarrow.Table` * as an Arrow record batch. Finally, each record batch is turned to * Iterator[InternalRow] using ColumnarBatch. * @@ -37,7 +37,7 @@ import org.apache.spark.sql.types.{StructField, StructType} * Both the Python worker and the Java executor need to have enough memory to * hold the largest group. The memory on the Java side is used to construct the * record batch (off heap memory). The memory on the Python side is used for - * holding the `pandas.DataFrame`. It's possible to further split one group into + * holding the `pyarrow.Table`. It's possible to further split one group into * multiple record batches to reduce the memory footprint on the Java side, this * is left as future work. */