Skip to content

SPARK-3069 [DOCS] Build instructions in README are outdated #2014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Aug 18, 2014

Here's my crack at Bertrand's suggestion. The Github README.md contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now.

(Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.)

@bbossy
Copy link
Contributor

bbossy commented Aug 20, 2014

Yes, LGTM

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have started for PR 2014 at commit 3a2bcad.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have finished for PR 2014 at commit 3a2bcad.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • In multiclass classification, all$2^`
    • public final class JavaDecisionTree

@srowen
Copy link
Member Author

srowen commented Aug 21, 2014

Ha, false positive. It picked up a line of text in README.md that contains a class declaration. Maybe the checker can avoid known non-source extensions. The unit test failure can't be related as this is just a doc change.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

QA tests have started for PR 2014 at commit 7aa045e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

QA tests have started for PR 2014 at commit 5c6b814.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

QA tests have finished for PR 2014 at commit 7aa045e.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

QA tests have finished for PR 2014 at commit 5c6b814.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • In multiclass classification, all$2^`
    • public final class JavaDecisionTree

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

QA tests have started for PR 2014 at commit 9b56494.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

QA tests have finished for PR 2014 at commit 9b56494.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

srowen referenced this pull request Aug 27, 2014
The Contributing to Spark guide [recommends](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting) running tests by calling `./dev/run-tests`. The README should, too.

`./sbt/sbt test` does not cover Python tests or style tests.

Author: nchammas <nicholas.chammas@gmail.com>

Closes #2149 from nchammas/patch-2 and squashes the following commits:

2b3b132 [nchammas] [Docs] Run tests like in contributing guide
@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have started for PR 2014 at commit 3e4a303.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 27, 2014

QA tests have finished for PR 2014 at commit 3e4a303.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


./sbt/sbt assembly
mvn -DskipTests clean package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the related discussion here, I don't think we want to change this. There are definitely two ways on offer to build Spark, but I think sbt is the default way to go and Maven is supported alongside it.

Perhaps we should just clarify this here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the officially documented way of building Spark is through maven: http://spark.apache.org/docs/latest/building-with-maven.html. We should keep this consistent with the docs. (The discussion you linked to refers to tests).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we cleared this up in the main discussion thread. I stand corrected.

@vanzin
Copy link
Contributor

vanzin commented Aug 27, 2014

Just wanted to point out that make-distribution.sh uses Maven.

Also, we have features available in the Maven build (e.g. Guava shading) that were intentionally left out of the sbt build, since, according to people, the sbt builds are not meant for external consumption.

@nchammas
Copy link
Contributor

make-distribution.sh uses Maven.

Oh, good point. Erm... I guess we need one of the project maintainers to step in then and clarify the place of sbt relative to Maven.

according to people, the sbt builds are not meant for external consumption.

Makes more sense now that I see distributions are created using Maven.

@pwendell
Copy link
Contributor

Yeah so our position on the builds is that we officially recommend Maven for packaging spark but we support sbt for day-to-day development since it can provide much faster iteration. So I would be inclined to point towards Maven in the docs - and allow more advanced developers to use SBT. The reason why it's hard to "officially" support both is that they use different policies for dependency resolution - this is just a fundamental difference in the way that Maven and sbt/ivy work - this isn't typically an issue when doing normal development, but in deployments it can cause a huge dependency headache. And supporting both in this regard would be extremely difficult.

@pwendell
Copy link
Contributor

@nchammas
Copy link
Contributor

Ah, thanks for that. Clears things up for me about sbt vs. Maven.

So if we want to stress Maven as the standard, where would we move the sbt-related documentation to? I imagine we want to keep that around as an "advanced" feature. Or is that pretty much why it's here on GitHub?

@srowen
Copy link
Member Author

srowen commented Aug 28, 2014

@nchammas @pwendell Is the net conclusion that README.md should use Maven if anything?
I'd be happy to move the wiki into CONTRIBUTING.md but then I can't remove the wiki page and it ends up duplicated again. Maybe it's fine as is and the important change is getting the file in place to trigger the prompt on the PR screen. If so then I think this is still ready for review/merge as you all see fit.

@nchammas
Copy link
Contributor

Is the net conclusion that README.md should use Maven if anything?

Not sure. It sounds like Maven is indeed in the official standard for building Spark, but we do want to document the sbt instructions somewhere. Dunno if the README is that place.

but then I can't remove the wiki page and it ends up duplicated again

I think if Patrick / Reynold / Spark committers agree that the CONTRIBUTING file would be better placed on GitHub, they can have the wiki page removed in short order.

asfgit pushed a commit that referenced this pull request Aug 31, 2014
Look only at code files (`.py`, `.java`, and `.scala`) for new classes.

Should get rid of false alarms like [the one reported here](#2014 (comment)).

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #2184 from nchammas/jenkins-ignore-noncode and squashes the following commits:

33786ac [Nicholas Chammas] break up long line
3f91a14 [Nicholas Chammas] rename array of source files
8b82a26 [Nicholas Chammas] [Spark QA] only check code files for new classes
can be run using:

./dev/run-tests
Please see the guidance on how to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's nice that we have the tl;dr here on how to run tests. The wiki link might not be stable (we are linking to an anchor tab) and the content there is only one or two sentences... not sure it adds much

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have finished for PR 2014 at commit be82027.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Sep 15, 2014

@andrewor14 @nchammas @pwendell Humble ping on this one, I think it's good to go, and probably helps head off some build questions going forward.

The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables
can be set to control the SBT build. For example:

sbt -Pyarn -Phadoop-2.3 compile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a bit more color here about how to use sbt, to match what used to be in the GitHub README? Or is this sufficient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal here is just a taste, assuming the advanced developer will understand and figure out the rest if needed. Happy to make further edits though, like, should we still suggest ./sbt/sbt instead of a local sbt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't know enough to make a recommendation; I'll leave that to others. Just wanted to call out the fact that we'd have less info on using sbt than before. Maybe that's a good thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd give sbt/sbt but other than that, seems fine to have just a small example.

@markhamstra
Copy link
Contributor

There really should be at least some mention of zinc (https://github.com/typesafehub/zinc) in our maven build instructions, since using zinc greatly improves the maven + scala experience.

@srowen
Copy link
Member Author

srowen commented Sep 15, 2014

Yes, the build already warns if zinc is not being used.
To keep this scoped, I suggest that could be handled separately if more docs were desired about zinc.

@markhamstra
Copy link
Contributor

Yes, I know that the scala-maven-plugin will throw warnings if zinc isn't being used. I also know that many users are either confused by those warnings or ignore them completely, in large measure because our build docs never mention zinc even though zinc usage should be commonplace and arguably in the default "how to build" instruction.

@srowen
Copy link
Member Author

srowen commented Sep 15, 2014

@pwendell I changed to sbt/sbt, and @markhamstra I took the liberty of adding a note on zinc while we're at it.

@SparkQA
Copy link

SparkQA commented Sep 15, 2014

QA tests have started for PR 2014 at commit db2bd97.

  • This patch merges cleanly.


[Zinc](https://github.com/typesafehub/zinc) is a long-running server version of SBT's incremental
compiler. When run locally as a background process, it speeds up builds of Scala-based projects
like Spark. Developers who regularly recompile Spark will be most interested in Zinc. The project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be something like "...who regularly recompile Spark with Maven will be..."; else some may think that Zinc is something to be used with SBT -- especially given that this comment is placed immediately after "Building with SBT".

@srowen
Copy link
Member Author

srowen commented Sep 15, 2014

@markhamstra Nice one, change coming up...

@SparkQA
Copy link

SparkQA commented Sep 15, 2014

QA tests have started for PR 2014 at commit 501507e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 15, 2014

QA tests have finished for PR 2014 at commit db2bd97.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 15, 2014

QA tests have finished for PR 2014 at commit 501507e.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

markdown: kramdown
gems:
- jekyll-redirect-from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this config mean that users don't need to install them gem manually?

@pwendell
Copy link
Contributor

LGTM pending one minor comment.

@pwendell
Copy link
Contributor

@srowen are you planning to add more to this or is it GTG from your perspective?

@srowen
Copy link
Member Author

srowen commented Sep 16, 2014

@pwendell no I believe that the user still has to install the gem. I did at least. Yes this is GTG from my end.

@pwendell
Copy link
Contributor

Okay I can merge this. One thing though, we've typically had less-than-smooth experiences with jekyll and its dependencies. So if this feature causes issues for users I'd propose we just maintain a tombstone and write one line of javascript to redirect. Google, Yahoo, Bing, etc can correctly index this behavior (I've done this a few times in the past to migrate from one site to another) and move to the new page.

@asfgit asfgit closed this in 61e21fe Sep 16, 2014
@nchammas
Copy link
Contributor

FYI: This page is 404-ing: http://spark.apache.org/docs/latest/building-spark.html

Is that temporary?

@srowen srowen deleted the SPARK-3069 branch September 21, 2014 14:32
@pwendell
Copy link
Contributor

pwendell commented Oct 6, 2014

@nchammas that page won't appear until we actually update the live docs (something that happens for each release rather than when a push a PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants