[SPARK-3500] [SQL] use JavaSchemaRDD as SchemaRDD._jschema_rdd #2369

davies · 2014-09-12T06:27:09Z

Currently, SchemaRDD._jschema_rdd is SchemaRDD, the Scala API (coalesce(), repartition()) can not been called in Python easily, there is no way to specify the implicit parameter ord. The _jrdd is an JavaRDD, so _jschema_rdd should also be JavaSchemaRDD.

In this patch, change _schema_rdd to JavaSchemaRDD, also added an assert for it. If some methods are missing from JavaSchemaRDD, then it's called by _schema_rdd.baseSchemaRDD().xxx().

BTW, Do we need JavaSQLContext?

SparkQA · 2014-09-12T06:34:19Z

QA tests have started for PR 2369 at commit abee159.

This patch merges cleanly.

SparkQA · 2014-09-12T07:34:30Z

QA tests have finished for PR 2369 at commit abee159.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2014-09-13T00:59:56Z

This looks good to me. There's some ongoing discussion on the JIRA over whether this should be included in 1.1.1.

JoshRosen · 2014-09-13T01:03:39Z

I think this is clearly a bug, not a missing feature, since SchemaRDD instances expose a public method that always throws an exception when called. I'd like to merge this into master and branch-1.1.

JoshRosen · 2014-09-13T02:29:49Z

Backported into branch-1.1 (a couple of minor merge conflicts, but only in tests.py; I fixed them by hand).

nchammas · 2014-09-13T03:52:37Z

python/pyspark/tests.py

+
+        srdd = srdd.coalesce(2, True)
+        srdd = srdd.repartition(3)
+        srdd = srdd.distinct()


@davies Shouldn't we also test srdd.distinct(n) since that was the missing functionality documented in SPARK-3500?

Fair point, although if srdd.distinct() works then srdd.distinct(n) should also work due to how distinct() and this fix were implemented.

OK, I'll take your word for it but just point out that one of the reported issues in SPARK-3500 was specifically that distinct() worked but distinct(n) didn't. Since that is a possible failure mode, it probably makes sense to have a test for each.

Actually, it looks like we don't support distinct(n) in PySpark (the original ticket dealt with distinct() and coalesce() simply not working). Let's open a separate JIRA for that.

Yeah, a separate issue makes sense (I actually suggested that in the JIRA ticket). But to clarify, the ticket was originally about coalesce() not working, then repartition() and distinct(n) were added on.

distinct() with no parameters was always working. There was no question about that.

distinct(n) is a missing API, we could fix it in another issue or delay it later.

use JavaSchemaRDD as SchemaRDD._jschema_rdd

abee159

asfgit closed this in 885d162 Sep 13, 2014

nchammas reviewed Sep 13, 2014
View reviewed changes

davies deleted the fix_schemardd branch September 15, 2014 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-3500] [SQL] use JavaSchemaRDD as SchemaRDD._jschema_rdd #2369

[SPARK-3500] [SQL] use JavaSchemaRDD as SchemaRDD._jschema_rdd #2369

Uh oh!

davies commented Sep 12, 2014

Uh oh!

SparkQA commented Sep 12, 2014

Uh oh!

SparkQA commented Sep 12, 2014

Uh oh!

JoshRosen commented Sep 13, 2014

Uh oh!

JoshRosen commented Sep 13, 2014

Uh oh!

JoshRosen commented Sep 13, 2014

Uh oh!

nchammas Sep 13, 2014

Uh oh!

JoshRosen Sep 13, 2014

Uh oh!

nchammas Sep 13, 2014

Uh oh!

JoshRosen Sep 13, 2014

Uh oh!

nchammas Sep 13, 2014

Uh oh!

davies Sep 13, 2014

Uh oh!

Uh oh!

[SPARK-3500] [SQL] use JavaSchemaRDD as SchemaRDD._jschema_rdd #2369

[SPARK-3500] [SQL] use JavaSchemaRDD as SchemaRDD._jschema_rdd #2369

Uh oh!

Conversation

davies commented Sep 12, 2014

Uh oh!

SparkQA commented Sep 12, 2014

Uh oh!

SparkQA commented Sep 12, 2014

Uh oh!

JoshRosen commented Sep 13, 2014

Uh oh!

JoshRosen commented Sep 13, 2014

Uh oh!

JoshRosen commented Sep 13, 2014

Uh oh!

nchammas Sep 13, 2014

Choose a reason for hiding this comment

Uh oh!

JoshRosen Sep 13, 2014

Choose a reason for hiding this comment

Uh oh!

nchammas Sep 13, 2014

Choose a reason for hiding this comment

Uh oh!

JoshRosen Sep 13, 2014

Choose a reason for hiding this comment

Uh oh!

nchammas Sep 13, 2014

Choose a reason for hiding this comment

Uh oh!

davies Sep 13, 2014

Choose a reason for hiding this comment

Uh oh!

Uh oh!