Skip to content

Hadoop agnostic builds #838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Aug 20, 2013
Merged

Hadoop agnostic builds #838

merged 31 commits into from
Aug 20, 2013

Conversation

jey
Copy link
Contributor

@jey jey commented Aug 15, 2013

This PR allows one Spark binary to target multiple Hadoop versions. It also moves YARN support into a separate artifact. This is the follow-up to PR #803.

CC: @mateiz, @mridulm, @tgravescs

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/609/

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

Unfortunately, the automated tests for this request have failed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/613/

@jey
Copy link
Contributor Author

jey commented Aug 16, 2013

Jenkins, retest this please.

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/623/

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/627/

@jey
Copy link
Contributor Author

jey commented Aug 17, 2013

(I meant: the Maven build is having problems with 0.23.x)

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/643/

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

Unfortunately, the automated tests for this request have failed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/645/

@jey
Copy link
Contributor Author

jey commented Aug 19, 2013

Jenkins, retest this please.

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/653/

Seq(
"org.apache.hadoop" % "hadoop-yarn-api" % hadoopVersion excludeAll(excludeJackson, excludeNetty, excludeAsm),
"org.apache.hadoop" % "hadoop-yarn-common" % hadoopVersion excludeAll(excludeJackson, excludeNetty, excludeAsm),
"org.apache.hadoop" % "hadoop-yarn-client" % hadoopVersion excludeAll(excludeJackson, excludeNetty, excludeAsm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jey, just to understand, this means that that users who link to us when running on hadoop 0.23.x have to also add these to their project in addition to hadoop-client version 0.23.x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, because the issue of explicitly linking against the hadoop libs only applies to non-YARN builds. That does bring up another issue though: right now the spark-core artifact will by defaualt be built with dependency on hadoop >= 1.2.1. I'll look into figuring out how to specify a more accurate set of constraints to the POM dependency mechanism

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That dependency is fine for spark-core. The main thing is to document what else users should add to use a newer Hadoop. (E.g. They'd add a newer hadoop-client, but they may also have to add this yarn stuff).

Matei

On Aug 19, 2013, at 12:12 PM, Jey Kottalam notifications@github.com wrote:

In project/SparkBuild.scala:

  •        "org.apache.hadoop" % "hadoop-core" % HADOOP_VERSION excludeAll(excludeJackson, excludeNetty, excludeAsm),
    
  •        "org.apache.hadoop" % "hadoop-client" % HADOOP_VERSION excludeAll(excludeJackson, excludeNetty, excludeAsm)
    
  •      )
    
  •    }
    
  •  } else {
    
  •    Seq("org.apache.hadoop" % "hadoop-core" % HADOOP_VERSION excludeAll(excludeJackson, excludeNetty) )
    
  •  }),
    
  • unmanagedSourceDirectories in Compile <+= baseDirectory{ _ /
  •  ( if (HADOOP_YARN && HADOOP_MAJOR_VERSION == "2") {
    
  •    "src/hadoop2-yarn/scala"
    
  •  if (isYarnMode) {
    
  •    // This kludge is needed for 0.23.x
    
  •    Seq(
    
  •      "org.apache.hadoop" % "hadoop-yarn-api" % hadoopVersion excludeAll(excludeJackson, excludeNetty, excludeAsm),
    
  •      "org.apache.hadoop" % "hadoop-yarn-common" % hadoopVersion excludeAll(excludeJackson, excludeNetty, excludeAsm),
    
  •      "org.apache.hadoop" % "hadoop-yarn-client" % hadoopVersion excludeAll(excludeJackson, excludeNetty, excludeAsm)
    
    Nah, because the issue of explicitly linking against the hadoop libs only applies to non-YARN builds. That does bring up another issue though: right now the spark-core artifact will by defaualt be built with dependency on hadoop >= 1.2.1. I'll look into figuring out how to specify a more accurate set of constraints to the POM dependency mechanism


Reply to this email directly or view it on GitHub.

@mateiz
Copy link
Member

mateiz commented Aug 19, 2013

Hey Jey, I tested this and it looks good, though I had that question above.

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/658/

@mateiz mateiz merged commit 6f6944c into mesos:master Aug 20, 2013
@mateiz
Copy link
Member

mateiz commented Aug 20, 2013

Thanks for putting this together, Jey. I've merged it manually due to a small conflict.

@pwendell
Copy link
Contributor

@mateiz @jey - It's very unfortunate that this got merged without any documentation or notification to developers. This will affect many downstream things (tests, anyone running off of master, or building things on top of master, the ec2 scripts, etc). Also, some of the existing documentation, such as docs/building-with-maven are now invalid and tell users to do the wrong thing. Could one of you send an e-mail to the dev list explaining what this means for people that consume Spark master? Also, please fix the existing docs ASAP and ideally add new docs explaining how to use this.

@rxin
Copy link
Member

rxin commented Aug 21, 2013

Have you tried running mvn package?

I am getting the following error:

*** RUN ABORTED ***
java.lang.NoSuchMethodError: spark.scheduler.cluster.ClusterTaskSetManager.(Lspark/scheduler/cluster/ClusterScheduler;Lspark/scheduler/TaskSet;)V
at spark.scheduler.DummyTaskSetManager.(ClusterSchedulerSuite.scala:30)
at spark.scheduler.ClusterSchedulerSuite.createDummyTaskSetManager(ClusterSchedulerSuite.scala:111)
at spark.scheduler.ClusterSchedulerSuite$$anonfun$1.apply$mcV$sp(ClusterSchedulerSuite.scala:146)
at spark.scheduler.ClusterSchedulerSuite$$anonfun$1.apply(ClusterSchedulerSuite.scala:134)
at spark.scheduler.ClusterSchedulerSuite$$anonfun$1.apply(ClusterSchedulerSuite.scala:134)
at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265)
at org.scalatest.Suite$class.withFixture(Suite.scala:1974)
at spark.scheduler.ClusterSchedulerSuite.withFixture(ClusterSchedulerSuite.scala:108)
at org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262)
at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM .......................... SUCCESS [2.028s]
[INFO] Spark Project Core ................................ FAILURE [7:24.910s]
[INFO] Spark Project Bagel ............................... SKIPPED
[INFO] Spark Project Streaming ........................... SKIPPED
[INFO] Spark Project ML Library .......................... SKIPPED
[INFO] Spark Project Examples ............................ SKIPPED
[INFO] Spark Project Tools ............................... SKIPPED
[INFO] Spark Project REPL ................................ SKIPPED
[INFO] Spark Project REPL binary packaging ............... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 7:27.541s

@mateiz
Copy link
Member

mateiz commented Aug 21, 2013

Did you do mvn clean and sbt clean? Sounds like an old build issue.

@mateiz
Copy link
Member

mateiz commented Aug 21, 2013

But yes I agree with Patrick on the docs -- I shouldn't have merged this without looking at that and trying Shark as well, so we know what will break there. Sorry about that.

@jey
Copy link
Contributor Author

jey commented Aug 21, 2013

@pwendell: Agreed, I'll send am email to the list ASAP and submit a patch for the docs shortly.

@rxin: As Matei said, that sounds like your classpath is contaminated with old build artifacts.

@rxin
Copy link
Member

rxin commented Aug 21, 2013

I tried running mvn dependency:tree after I removed .m2 and .ivy2 and sbt clean and mvn clean. Got the following error

[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM .......................... SUCCESS [0.706s]
[INFO] Spark Project Core ................................ SUCCESS [1.026s]
[INFO] Spark Project Bagel ............................... FAILURE [0.038s]
[INFO] Spark Project Streaming ........................... SKIPPED
[INFO] Spark Project ML Library .......................... SKIPPED
[INFO] Spark Project Examples ............................ SKIPPED
[INFO] Spark Project Tools ............................... SKIPPED
[INFO] Spark Project REPL ................................ SKIPPED
[INFO] Spark Project REPL binary packaging ............... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.574s
[INFO] Finished at: Tue Aug 20 19:03:59 PDT 2013
[INFO] Final Memory: 10M/81M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project spark-bagel: Could not resolve dependencies for project org.spark-project:spark-bagel:jar:0.8.0-SNAPSHOT: Could not find artifact org.spark-project:spark-core:jar:0.8.0-SNAPSHOT -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]

@jey
Copy link
Contributor Author

jey commented Aug 21, 2013

@rxin, it's my understanding that this is "normal" for running dependency:tree on the Maven build before packaging. I think after performing mvn -DskipTests package you'll be able to run mvn dependency:tree, mvn package, mvn test, etc.

@jey
Copy link
Contributor Author

jey commented Aug 21, 2013

@rxin: actually, apparently Maven in its infinite wisdom requires mvn install before mvn dependency:tree will work: http://stackoverflow.com/a/1905927

@rxin
Copy link
Member

rxin commented Aug 21, 2013

alright thanks @jey. that worked (although a little bit convoluted...)

zhuguangbin pushed a commit to zhuguangbin/shark that referenced this pull request Oct 31, 2013
zhuguangbin pushed a commit to zhuguangbin/shark that referenced this pull request Oct 31, 2013
zhuguangbin pushed a commit to zhuguangbin/shark that referenced this pull request Oct 31, 2013
xiajunluan pushed a commit to xiajunluan/spark that referenced this pull request May 30, 2014
`lateral_view_outer` query sometimes returns a different set of 10 rows.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes mesos#838 from tdas/hive-test-fix2 and squashes the following commits:

9128a0d [Tathagata Das] Blacklisted flaky HiveCompatibility test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants