Skip to content

[SPARK-14804][Spark][Graphx] Fix Graph vertexRDD/EdgeRDD checkpoint results ClassCastException #12576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

suyanNone
Copy link
Contributor

@suyanNone suyanNone commented Apr 21, 2016

What changes were proposed in this pull request?

The PR fixed compute chain from CheckpointRDD<-vertexRDDImp to CheckpointRDD<-partitionRDD<- vertexRDDImpl

How was this patch tested?

unit test

@suyanNone
Copy link
Contributor Author

mistake to open, close first

@suyanNone suyanNone closed this Apr 21, 2016
@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56543 has finished for PR 12576 at commit f2ced74.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@suyanNone suyanNone reopened this Apr 22, 2016
@suyanNone suyanNone changed the title [Spark][Graphx] Fix Graph vertexRDD/EdgeRDD checkpoint results ClassCastException [SPARK-14804][Spark][Graphx] Fix Graph vertexRDD/EdgeRDD checkpoint results ClassCastException Apr 22, 2016
@SparkQA
Copy link

SparkQA commented Apr 22, 2016

Test build #56635 has finished for PR 12576 at commit a106758.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Sep 27, 2016

This fix is roughly in the right direction. However there are two major concerns.

  • It changes the behavior of VertexRDD.isCheckpointed() which is a public method.
  • There must be new tests added to make sure this does not happen again. I dont see any new tests that checkpoints, forces the checkpoint by count/collect, and then collects again.

Ping @jkbradley

@tdas
Copy link
Contributor

tdas commented Sep 27, 2016

also, ping @rxin

@tdas
Copy link
Contributor

tdas commented Sep 27, 2016

Actually, correction. Here is a better fix. The key problem here is that the RDD internal logic for computing from checkpoints, depends on a public, override-able API. This is wrong. Either the isCheckpointed should have been marked final, or the internal isCheckpointedAndMaterialized should not have depended on isCheckpointed (see here).

So the simplest, and correct solution is probably (not tested) to change isCheckpointedAndMaterialzed.

  private[spark] def isCheckpointedAndMaterialized: Boolean = checkpointData.exists(_.isCheckpointed)

This preserves the behavior of VertexRDD.isCheckpointed() and prevents such issues related to overridden RDD.isCheckpointed() from happening ever again.

@jkbradley
Copy link
Member

CC: @ankurdave just making you aware of this

@suyanNone
Copy link
Contributor Author

suyanNone commented Sep 28, 2016

@tdas agree, isCheckpointed better be final, in current code, isCheckpointed exposed as public is for testing?

@tdas
Copy link
Contributor

tdas commented Oct 3, 2016

Okay so can you update this PR according to the discussion, and add the necessary tests as well?

@neggert
Copy link
Contributor

neggert commented Oct 11, 2016

Not sure what's going on regarding the multiple PRs for this issue, but I cherry-picked this PR on top of 1.6.2 and it fixed the problem for me.

@maropu maropu mentioned this pull request Apr 23, 2017
maropu added a commit to maropu/spark that referenced this pull request Apr 23, 2017
@asfgit asfgit closed this in e9f9715 Apr 24, 2017
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
This pr proposed to close stale PRs. Currently, we have 400+ open PRs and there are some stale PRs whose JIRA tickets have been already closed and whose JIRA tickets does not exist (also, they seem not to be minor issues).

// Open PRs whose JIRA tickets have been already closed
Closes apache#11785
Closes apache#13027
Closes apache#13614
Closes apache#13761
Closes apache#15197
Closes apache#14006
Closes apache#12576
Closes apache#15447
Closes apache#13259
Closes apache#15616
Closes apache#14473
Closes apache#16638
Closes apache#16146
Closes apache#17269
Closes apache#17313
Closes apache#17418
Closes apache#17485
Closes apache#17551
Closes apache#17463
Closes apache#17625

// Open PRs whose JIRA tickets does not exist and they are not minor issues
Closes apache#10739
Closes apache#15193
Closes apache#15344
Closes apache#14804
Closes apache#16993
Closes apache#17040
Closes apache#15180
Closes apache#17238

N/A

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes apache#17734 from maropu/resolved_pr.

Change-Id: Id2e590aa7283fe5ac01424d30a40df06da6098b5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants