Skip to content

SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce}Util should not use package org.apache.hadoop #2814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Oct 15, 2014

(This is just a look at what completely moving the classes would look like. I know Patrick flagged that as maybe not OK, although, it's private?)

@SparkQA
Copy link

SparkQA commented Oct 15, 2014

QA tests have started for PR 2814 at commit ec1b84a.

  • This patch merges cleanly.

@srowen srowen changed the title SPARK-1209 [CORE] SparkHadoopUtil should not use package org.apache.hadoop SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce}Util should not use package org.apache.hadoop Oct 15, 2014
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21767/
Test FAILed.

@shaneknapp
Copy link
Contributor

jenkins, test this please.

@srowen
Copy link
Member Author

srowen commented Oct 15, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Oct 15, 2014

QA tests have finished for PR 2814 at commit ec1b84a.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21765/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21768/
Test FAILed.

@srowen
Copy link
Member Author

srowen commented Oct 16, 2014

I addressed the MIMA warnings, two of which were just caused by the class being renamed.

I'm less sure about the warning around PairRDDFunctions since SparkHadoopMapReduceUtil was included as a superclass (trait). Unfortunately. Right? it really isn't a case for inheritance.
So, I am not 100% clear what the implications of a rename are for binary compatibility.

@SparkQA
Copy link

SparkQA commented Oct 16, 2014

QA tests have started for PR 2814 at commit 821d12a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 16, 2014

QA tests have finished for PR 2814 at commit 821d12a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21806/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Oct 19, 2014

QA tests have started for PR 2814 at commit 11e7d5d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 20, 2014

QA tests have finished for PR 2814 at commit 11e7d5d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21895/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Oct 26, 2014

Test build #22248 has started for PR 2814 at commit 514cb38.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 26, 2014

Test build #22248 has finished for PR 2814 at commit 514cb38.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22248/
Test FAILed.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22373 has started for PR 2814 at commit 514cb38.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22373 timed out for PR 2814 at commit 514cb38 after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22373/
Test FAILed.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22388 has started for PR 2814 at commit 514cb38.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22388 has finished for PR 2814 at commit 514cb38.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22388/
Test FAILed.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22456 has started for PR 2814 at commit 514cb38.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22456 has finished for PR 2814 at commit 514cb38.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22456/
Test PASSed.

@andrewor14
Copy link
Contributor

LGTM. Looks like there are now merge conflicts.

@srowen
Copy link
Member Author

srowen commented Oct 30, 2014

@andrewor14 Yep, rebased it.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22563 has started for PR 2814 at commit ead1115.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22563 has finished for PR 2814 at commit ead1115.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22563/
Test PASSed.

@andrewor14
Copy link
Contributor

Ok I'm merging this. Thanks.

@asfgit asfgit closed this in 68cb69d Oct 30, 2014
@andrewor14
Copy link
Contributor

Hey @srowen I reverted this because it was failing tests for Spark built with Hadoop 1.0.4. I haven't dug into it deeply but it seems to be failing at run time due to some reflection issue. Do you know what's going on?

[info] FileSuite:
2014-10-30 16:34:12.769 java[30516:5903] Unable to load realm info from SCDynamicStore
[info] - text files *** FAILED *** (1 second, 112 milliseconds)
[info]   java.lang.IllegalAccessException: Class org.apache.spark.mapred.SparkHadoopMapRedUtil$class can not access a member of class org.apache.hadoop.mapred.JobContext with modifiers ""
[info]   at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:109)
[info]   at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:261)
[info]   at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:253)
[info]   at java.lang.reflect.Constructor.newInstance(Constructor.java:517)
[info]   at org.apache.spark.mapred.SparkHadoopMapRedUtil$class.newJobContext(SparkHadoopMapRedUtil.scala:29)
[info]   at org.apache.spark.SparkHadoopWriter.newJobContext(SparkHadoopWriter.scala:39)
[info]   at org.apache.spark.SparkHadoopWriter.getJobContext(SparkHadoopWriter.scala:149)
[info]   at org.apache.spark.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:63)
[info]   at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1033)
[info]   at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:939)
[info]   at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:848)
[info]   at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1162)
[info]   at org.apache.spark.FileSuite$$anonfun$1.apply$mcV$sp(FileSuite.scala:53)
[info]   at org.apache.spark.FileSuite$$anonfun$1.apply(FileSuite.scala:49)
[info]   at org.apache.spark.FileSuite$$anonfun$1.apply(FileSuite.scala:49)

@srowen
Copy link
Member Author

srowen commented Oct 31, 2014

Hm, that's weird since I thought I ran the test against the default Hadoop, and that's 1.0.4. Which test failed? (I'll go look around jenkins too)

I can't find 1.0.4 source at the moment, but it looks like in ancient times, the JobContextImpl constructor was package-private:

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapred/0.22.0/org/apache/hadoop/mapred/JobContextImpl.java

So JobContext is an interface and JobContextImpl has a non-public constructor. That's probably why this mechanism was created in the first place. Another solution is to change the constructor's accessibility at runtime. I can try that and verify the test again.

All of this would go away once Hadoop 1.x is not supported anymore too, along with some more acrobatics inside SparkHadoopWriter

asfgit pushed a commit that referenced this pull request Nov 10, 2014
…ot use package org.apache.hadoop

andrewor14 Another try at SPARK-1209, to address #2814 (comment)

I successfully tested with `mvn -Dhadoop.version=1.0.4 -DskipTests clean package; mvn -Dhadoop.version=1.0.4 test` I assume that is what failed Jenkins last time. I also tried `-Dhadoop.version1.2.1` and `-Phadoop-2.4 -Pyarn -Phive` for more coverage.

So this is why the class was put in `org.apache.hadoop` to begin with, I assume. One option is to leave this as-is for now and move it only when Hadoop 1.0.x support goes away.

This is the other option, which adds a call to force the constructor to be public at run-time. It's probably less surprising than putting Spark code in `org.apache.hadoop`, but, does involve reflection. A `SecurityManager` might forbid this, but it would forbid a lot of stuff Spark does. This would also only affect Hadoop 1.0.x it seems.

Author: Sean Owen <sowen@cloudera.com>

Closes #3048 from srowen/SPARK-1209 and squashes the following commits:

0d48f4b [Sean Owen] For Hadoop 1.0.x, make certain constructors public, which were public in later versions
466e179 [Sean Owen] Disable MIMA warnings resulting from moving the class -- this was also part of the PairRDDFunctions type hierarchy though?
eb61820 [Sean Owen] Move SparkHadoopMapRedUtil / SparkHadoopMapReduceUtil from org.apache.hadoop to org.apache.spark

(cherry picked from commit f8e5732)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
asfgit pushed a commit that referenced this pull request Nov 10, 2014
…ot use package org.apache.hadoop

andrewor14 Another try at SPARK-1209, to address #2814 (comment)

I successfully tested with `mvn -Dhadoop.version=1.0.4 -DskipTests clean package; mvn -Dhadoop.version=1.0.4 test` I assume that is what failed Jenkins last time. I also tried `-Dhadoop.version1.2.1` and `-Phadoop-2.4 -Pyarn -Phive` for more coverage.

So this is why the class was put in `org.apache.hadoop` to begin with, I assume. One option is to leave this as-is for now and move it only when Hadoop 1.0.x support goes away.

This is the other option, which adds a call to force the constructor to be public at run-time. It's probably less surprising than putting Spark code in `org.apache.hadoop`, but, does involve reflection. A `SecurityManager` might forbid this, but it would forbid a lot of stuff Spark does. This would also only affect Hadoop 1.0.x it seems.

Author: Sean Owen <sowen@cloudera.com>

Closes #3048 from srowen/SPARK-1209 and squashes the following commits:

0d48f4b [Sean Owen] For Hadoop 1.0.x, make certain constructors public, which were public in later versions
466e179 [Sean Owen] Disable MIMA warnings resulting from moving the class -- this was also part of the PairRDDFunctions type hierarchy though?
eb61820 [Sean Owen] Move SparkHadoopMapRedUtil / SparkHadoopMapReduceUtil from org.apache.hadoop to org.apache.spark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants