Skip to content

[SPARK-13390][SQL][branch-1.6]Fix the issue that Iterator.map().toSeq is not Serializable #11334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Feb 23, 2016

What changes were proposed in this pull request?

scala.collection.Iterator's methods (e.g., map, filter) will return an AbstractIterator which is not Serializable. E.g.,

scala> val iter = Array(1, 2, 3).iterator.map(_ + 1)
iter: Iterator[Int] = non-empty iterator

scala> println(iter.isInstanceOf[Serializable])
false

If we call something like Iterator.map(...).toSeq, it will create a Stream that contains a non-serializable AbstractIterator field and make the Stream be non-serializable.

This PR uses toArray instead of toSeq to fix such issue in def createDataFrame(data: java.util.List[_], beanClass: Class[_]): DataFrame.

How was the this patch tested?

Jenkins tests.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 23, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Feb 24, 2016

Test build #51816 has finished for PR 11334 at commit 40e6b2f.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 24, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Feb 24, 2016

Test build #51824 has finished for PR 11334 at commit 40e6b2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Feb 24, 2016

LGTM

asfgit pushed a commit that referenced this pull request Feb 24, 2016
…q is not Serializable

## What changes were proposed in this pull request?

`scala.collection.Iterator`'s methods (e.g., map, filter) will return an `AbstractIterator` which is not Serializable. E.g.,
```Scala
scala> val iter = Array(1, 2, 3).iterator.map(_ + 1)
iter: Iterator[Int] = non-empty iterator

scala> println(iter.isInstanceOf[Serializable])
false
```
If we call something like `Iterator.map(...).toSeq`, it will create a `Stream` that contains a non-serializable `AbstractIterator` field and make the `Stream` be non-serializable.

This PR uses `toArray` instead of `toSeq` to fix such issue in `def createDataFrame(data: java.util.List[_], beanClass: Class[_]): DataFrame`.

## How was the this patch tested?

Jenkins tests.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11334 from zsxwing/SPARK-13390.
@srowen
Copy link
Member

srowen commented Feb 24, 2016

Merged to 1.6

@zsxwing zsxwing closed this Feb 24, 2016
@zsxwing zsxwing deleted the SPARK-13390 branch February 24, 2016 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants