Skip to content

Commit 3a71cf9

Browse files
x1-Davies Liu
authored andcommitted
[SPARK-8535] [PYSPARK] PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name
Because implicit name of `pandas.columns` are Int, but `StructField` json expect `String`. So I think `pandas.columns` are should be convert to `String`. ### issue * [SPARK-8535 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name](https://issues.apache.org/jira/browse/SPARK-8535) Author: x1- <viva008@gmail.com> Closes #7124 from x1-/SPARK-8535 and squashes the following commits: d68fd38 [x1-] modify unit-test using pandas. ea1897d [x1-] For implicit name of pandas.columns are Int, so should be convert to String. (cherry picked from commit b6e76ed) Signed-off-by: Davies Liu <davies@databricks.com>
1 parent d720426 commit 3a71cf9

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

python/pyspark/sql/context.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -262,13 +262,15 @@ def createDataFrame(self, data, schema=None, samplingRatio=None):
262262
263263
>>> sqlContext.createDataFrame(df.toPandas()).collect() # doctest: +SKIP
264264
[Row(name=u'Alice', age=1)]
265+
>>> sqlContext.createDataFrame(pandas.DataFrame([[1, 2]]).collect()) # doctest: +SKIP
266+
[Row(0=1, 1=2)]
265267
"""
266268
if isinstance(data, DataFrame):
267269
raise TypeError("data is already a DataFrame")
268270

269271
if has_pandas and isinstance(data, pandas.DataFrame):
270272
if schema is None:
271-
schema = list(data.columns)
273+
schema = [str(x) for x in data.columns]
272274
data = [r.tolist() for r in data.to_records(index=False)]
273275

274276
if not isinstance(data, RDD):

0 commit comments

Comments
 (0)