Skip to content

[SPARK-13366] Support Cartesian join for Datasets #11244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

[SPARK-13366] Support Cartesian join for Datasets #11244

wants to merge 1 commit into from

Conversation

xguo27
Copy link
Contributor

@xguo27 xguo27 commented Feb 17, 2016

No description provided.

@marmbrus
Copy link
Contributor

ok to test

*
* @since 2.0.0
*/
def joinWith[U](other: Dataset[U]): Dataset[(T, U)] = joinWith(other, lit(true), "inner")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually need to use a true literal here, just construct a join without a condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback @marmbrus . The only join API in Dataset I can find is:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L644

which expects a Column. Do you mean to add some other method like the one in Dataframe:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L383-L385

If so, I'm wondering whether we need to refactor out the code that handles encoder?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would just make a protected version of joinWith above where the condition is an Option and have all the other methods delegate to that.

@SparkQA
Copy link

SparkQA commented Feb 18, 2016

Test build #51450 has finished for PR 11244 at commit 27a58df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xguo27
Copy link
Contributor Author

xguo27 commented Feb 21, 2016

Thanks @marmbrus ! I have updated the change following your suggestion.

@SparkQA
Copy link

SparkQA commented Feb 21, 2016

Test build #51609 has finished for PR 11244 at commit c79ec51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

@liancheng is this going to be done by your unification? Should we merge or hold off?

@rxin
Copy link
Contributor

rxin commented Jun 15, 2016

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. We can also continue the discussion on the JIRA ticket.

@asfgit asfgit closed this in 1a33f2e Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants