Skip to content

Commit

Permalink
[SQL] Python JsonRDD UTF8 Encoding Fix
Browse files Browse the repository at this point in the history
Only encode unicode objects to UTF-8, and not strings

Author: Ahir Reddy <ahirreddy@gmail.com>

Closes apache#1914 from ahirreddy/json-rdd-unicode-fix1 and squashes the following commits:

ca4e9ba [Ahir Reddy] Encoding Fix
  • Loading branch information
ahirreddy authored and marmbrus committed Aug 14, 2014
1 parent add75d4 commit fde692b
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion python/pyspark/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -1267,7 +1267,9 @@ def func(iterator):
for x in iterator:
if not isinstance(x, basestring):
x = unicode(x)
yield x.encode("utf-8")
if isinstance(x, unicode):
x = x.encode("utf-8")
yield x
keyed = rdd.mapPartitions(func)
keyed._bypass_serializer = True
jrdd = keyed._jrdd.map(self._jvm.BytesToString())
Expand Down

0 comments on commit fde692b

Please sign in to comment.