Skip to content

Conversation

@sryza
Copy link
Contributor

@sryza sryza commented Nov 13, 2014

It seems like the winds might have moved away from this approach, but wanted to post the PR anyway because I got it working and to show what it would look like.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23302 has started for PR 3239 at commit ea056ce.

  • This patch merges cleanly.

@ScrapCodes
Copy link
Member

@sryza I was assuming that we create profile for kafka only and not other submodules.

@sryza
Copy link
Contributor Author

sryza commented Nov 13, 2014

Right, this was before we came to that decision. Will update this to just do Kafka.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23302 has finished for PR 3239 at commit ea056ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaSimpleTextClassificationPipeline
    • case class LabeledDocument(id: Long, text: String, label: Double)
    • case class Document(id: Long, text: String)
    • abstract class EdgeRDD[ED, VD](
    • abstract class VertexRDD[VD](
    • abstract class Estimator[M <: Model[M]] extends PipelineStage with Params
    • abstract class Evaluator extends Identifiable
    • abstract class Model[M <: Model[M]] extends Transformer
    • abstract class PipelineStage extends Serializable with Logging
    • class Pipeline extends Estimator[PipelineModel]
    • abstract class Transformer extends PipelineStage with Params
    • class LogisticRegression extends Estimator[LogisticRegressionModel] with LogisticRegressionParams
    • class HashingTF extends UnaryTransformer[Iterable[_], Vector, HashingTF]
    • class StandardScaler extends Estimator[StandardScalerModel] with StandardScalerParams
    • class Tokenizer extends UnaryTransformer[String, Seq[String], Tokenizer]
    • class Param[T] (
    • class DoubleParam(parent: Params, name: String, doc: String, defaultValue: Option[Double] = None)
    • class IntParam(parent: Params, name: String, doc: String, defaultValue: Option[Int] = None)
    • class FloatParam(parent: Params, name: String, doc: String, defaultValue: Option[Float] = None)
    • class LongParam(parent: Params, name: String, doc: String, defaultValue: Option[Long] = None)
    • class BooleanParam(parent: Params, name: String, doc: String, defaultValue: Option[Boolean] = None)
    • case class ParamPair[T](param: Param[T], value: T)
    • trait Params extends Identifiable with Serializable
    • class CrossValidator extends Estimator[CrossValidatorModel] with CrossValidatorParams with Logging
    • class ParamGridBuilder

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23302/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23307 has started for PR 3239 at commit fc9c305.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23308 has started for PR 3239 at commit 53c4745.

  • This patch merges cleanly.

@sryza
Copy link
Contributor Author

sryza commented Nov 13, 2014

I had some additional conversation with @pwendell and we agreed that SPARK-4376 (putting external modules behind maven profiles) is worthwhile, so this PR implements both that and SPARK-4375. It's not 100% polished - it needs documentation and I'm still hitting random errors with scala-2.11 under certain conditions.

To build assembly and not build any of the external modules or examples:
mvn package

To build assembly and one of the external modules:
mvn package -Pflume

To build assembly and all external modules and examples:
mvn package -Pexamples -Pkafka

To build assembly with 2.11:
mvn package -Pscala-2.11

To build assembly and one of the external modules with 2.11:
mvn package -Pscala-2.11 -Pflume

To build assembly and all external modules and examples with 2.11 (except for Kafka, which doesn't work with 2.11):
mvn package -Pscala-2.11 -Pexamples

This patch removes the Algebird examples entirely, because they're fairly obscure, show off more Algebird functionality than Spark functionality, don't work with Scala 2.11, and would require extra profiles to support.

To get the Scala-2.11 build to work, I needed to set all the artifact IDs to include the scala.binary.version instead of hardcode 2.10, which seems right to me unless I'm missing something? Not sure how this was working before.

@srowen
Copy link
Member

srowen commented Nov 13, 2014

You mean you need the declared artifact IDs to include a property value? I thought that wasn't possible in Maven, which is what created a lot of the complication in the first place. Dependencies' artifact ID can include property values.

@sryza
Copy link
Contributor Author

sryza commented Nov 13, 2014

I think Sean is right. Have half a fix for this but am going to go to bed now.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23307 has finished for PR 3239 at commit fc9c305.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23307/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23308 has finished for PR 3239 at commit 53c4745.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23308/
Test PASSed.

@sryza
Copy link
Contributor Author

sryza commented Nov 13, 2014

@ScrapCodes @pwendell I just tried "mvn package -Pscala-2.11" without my patch and still got errors:
The following artifacts could not be resolved: org.apache.spark:spark-network-common_2.11:jar:1.2.0-SNAPSHOT

Any idea what's going on there?

@vanzin
Copy link
Contributor

vanzin commented Nov 13, 2014

@sryza could you change the PR title / description to be more informative? This change is big enough that it should have a good commit message. You last comment should be a good candidate for that commit message.

I personally feel that the examples profile should be enabled by default, to keep the default build the same (and avoid people inadvertently not compiling that code).

@pwendell
Copy link
Contributor

@sryza I think the artifact ID needs to be hard coded to work when we publish artifacts. If you want to change to scala 2.11 we have a script in ./dev that will re-write these. This is only really relevant if people are publishing artifacts.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23319 has started for PR 3239 at commit ad7b7be.

  • This patch merges cleanly.

@sryza sryza changed the title SPARK-4375. maven rejiggering SPARK-4375. no longer require -Pscala-2.10 and place external modules behind profiles Nov 13, 2014
@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23321 has started for PR 3239 at commit 88a2390.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23319 has finished for PR 3239 at commit ad7b7be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23319/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23321 has finished for PR 3239 at commit 88a2390.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23321/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add this to binary releases also (below in this script).

@sryza
Copy link
Contributor Author

sryza commented Nov 14, 2014

Here's a patch with a simpler approach that relies on @vanzin 's suggestion of a -Dscala-2.11 property. I still like the idea of putting the external projects and examples behind maven profiles, but we can do that in a separate PR.

@pwendell if you think this is a good approach, I'll add some documentation.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23339 has started for PR 3239 at commit 587f671.

  • This patch merges cleanly.

@vanzin
Copy link
Contributor

vanzin commented Nov 14, 2014

LGTM if Jenkins is happy. If you don't plan to address the "examples profile" thing, probably a good idea to update the PR's title.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23339 has finished for PR 3239 at commit 587f671.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23339/
Test PASSed.

@pwendell
Copy link
Contributor

Hey Sandy - I think this looks good. I wasn't able to get it to succeed locally. but I think something is messed up with my local environment since even the master build isn't working. Could you add the relevant documentation?

@sryza sryza changed the title SPARK-4375. no longer require -Pscala-2.10 and place external modules behind profiles SPARK-4375. no longer require -Pscala-2.10 Nov 14, 2014
@sryza
Copy link
Contributor Author

sryza commented Nov 14, 2014

Updated the doc - it seems like there's actually not a ton more to say, but let me know if I missed anything.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23355 has started for PR 3239 at commit dfdb3d9.

  • This patch merges cleanly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure if it will work with sbt as sbt/sbt -Dscala-2.11. Did you happen to try ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that instruction only applies to maven. Do we try to make all the same options have the same effect? I could add a line under the sbt section that explains how to use sbt with Scala 2.11 (-Pscala-2.11).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it is okay to add an exception. We sort of decided on same flags for both build to simplify things for sbt users(and may be maven users too). Its okay for things we don't have choice and should not forfeit the purpose. I will let Patrick comment too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also don't think it would be hard to make the sbt build support that property

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the SBT build could easily support this I think

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23355 has finished for PR 3239 at commit dfdb3d9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23355/
Test FAILed.

@pwendell
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23377 has started for PR 3239 at commit 0ffbe95.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23377 has finished for PR 3239 at commit 0ffbe95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23377/
Test PASSed.

@pwendell
Copy link
Contributor

Looks good - I played with this locally.

asfgit pushed a commit that referenced this pull request Nov 14, 2014
It seems like the winds might have moved away from this approach, but wanted to post the PR anyway because I got it working and to show what it would look like.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #3239 from sryza/sandy-spark-4375 and squashes the following commits:

0ffbe95 [Sandy Ryza] Enable -Dscala-2.11 in sbt
cd42d94 [Sandy Ryza] Update doc
f6644c3 [Sandy Ryza] SPARK-4375 take 2

(cherry picked from commit f5f757e)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@asfgit asfgit closed this in f5f757e Nov 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants