Skip to content

Commit 4fcbc6d

Browse files
committed
[SPARK-31735][CORE] Include all columns in the summary report
For example, dates are missing from the export: from datetime import datetime, timedelta, timezone from pyspark.sql import types as T from pyspark.sql import Row from pyspark.sql import functions as F START = datetime(2014, 1, 1, tzinfo=timezone.utc) n_days = 22 date_range = [Row(date=(START + timedelta(days=n))) for n in range(0, n_days)] schema = T.StructType([T.StructField(name="date", dataType=T.DateType(), nullable=False)]) rdd = spark.sparkContext.parallelize(date_range) df = spark.createDataFrame(data=rdd, schema=schema) df.agg(F.max("date")).show() df.summary().show() +-------+ |summary| +-------+ | count| | mean| | stddev| | min| | 25%| | 50%| | 75%| | max| +-------+ Signed-off-by: Fokko Driesprong <fokko@apache.org>
1 parent 2012d58 commit 4fcbc6d

File tree

1 file changed

+0
-1
lines changed

1 file changed

+0
-1
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,6 @@ object StatFunctions extends Logging {
264264
}
265265

266266
val selectedCols = ds.logicalPlan.output
267-
.filter(a => a.dataType.isInstanceOf[NumericType] || a.dataType.isInstanceOf[StringType])
268267

269268
val aggExprs = statisticFns.flatMap { func =>
270269
selectedCols.map(c => Column(Cast(func(c), StringType)).as(c.name))

0 commit comments

Comments
 (0)