-
Notifications
You must be signed in to change notification settings - Fork 28.6k
SPARK-1380: Add sort-merge based cogroup/joins. #283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Is there a specific use case you are trying to address that cannot be handled by the hash join? |
I have not done a detailed review - but looks pretty expensive in terms of memory. |
@rxin Thank you for your reply. There are some case to use merge join for optimization:
I think it is useful for users to choose ways to optimize their processing. |
@mridulm Thank you for your reply. There are 2 points I have to mention about memory:
|
I'd suggest we close this issue for now and go to the JIRA to discuss whether the feature is needed and how high of a priority it is. |
* Monitor pod status in submission v2. * Address comments
* Monitor pod status in submission v2. * Address comments
* upgrade hadoop to 2.9.0-palantir.1-rc9 * run test-dependencies.sh --replace-manifest * missed one * no more rc for deps * and the poms * fix the test * bump to 2.9.0-palantir.2
This reverts commit 65956b7.
* Revert "Bump Hadoop to 2.9.0-palantir.3 (apache#288)" This reverts commit bb010b8. * Revert "Hadoop 2.9.0-palantir.2 (apache#283)" This reverts commit 65956b7.
Fix ansible testing fails
I've written cogroup/joins based on 'Sort-Merge' algorithm.