-
Notifications
You must be signed in to change notification settings - Fork 28.6k
SPARK-3069 [DOCS] Build instructions in README are outdated #2014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yes, LGTM |
QA tests have started for PR 2014 at commit
|
QA tests have finished for PR 2014 at commit
|
Ha, false positive. It picked up a line of text in |
QA tests have started for PR 2014 at commit
|
QA tests have started for PR 2014 at commit
|
QA tests have finished for PR 2014 at commit
|
QA tests have finished for PR 2014 at commit
|
QA tests have started for PR 2014 at commit
|
QA tests have finished for PR 2014 at commit
|
The Contributing to Spark guide [recommends](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting) running tests by calling `./dev/run-tests`. The README should, too. `./sbt/sbt test` does not cover Python tests or style tests. Author: nchammas <nicholas.chammas@gmail.com> Closes #2149 from nchammas/patch-2 and squashes the following commits: 2b3b132 [nchammas] [Docs] Run tests like in contributing guide
QA tests have started for PR 2014 at commit
|
QA tests have finished for PR 2014 at commit
|
|
||
./sbt/sbt assembly | ||
mvn -DskipTests clean package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the related discussion here, I don't think we want to change this. There are definitely two ways on offer to build Spark, but I think sbt
is the default way to go and Maven is supported alongside it.
Perhaps we should just clarify this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the officially documented way of building Spark is through maven: http://spark.apache.org/docs/latest/building-with-maven.html. We should keep this consistent with the docs. (The discussion you linked to refers to tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we cleared this up in the main discussion thread. I stand corrected.
Just wanted to point out that Also, we have features available in the Maven build (e.g. Guava shading) that were intentionally left out of the sbt build, since, according to people, the sbt builds are not meant for external consumption. |
Oh, good point. Erm... I guess we need one of the project maintainers to step in then and clarify the place of
Makes more sense now that I see distributions are created using Maven. |
Yeah so our position on the builds is that we officially recommend Maven for packaging spark but we support sbt for day-to-day development since it can provide much faster iteration. So I would be inclined to point towards Maven in the docs - and allow more advanced developers to use SBT. The reason why it's hard to "officially" support both is that they use different policies for dependency resolution - this is just a fundamental difference in the way that Maven and sbt/ivy work - this isn't typically an issue when doing normal development, but in deployments it can cause a huge dependency headache. And supporting both in this regard would be extremely difficult. |
@nchammas there is a bit more color in this thread: |
Ah, thanks for that. Clears things up for me about So if we want to stress Maven as the standard, where would we move the |
@nchammas @pwendell Is the net conclusion that |
Not sure. It sounds like Maven is indeed in the official standard for building Spark, but we do want to document the
I think if Patrick / Reynold / Spark committers agree that the CONTRIBUTING file would be better placed on GitHub, they can have the wiki page removed in short order. |
Look only at code files (`.py`, `.java`, and `.scala`) for new classes. Should get rid of false alarms like [the one reported here](#2014 (comment)). Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #2184 from nchammas/jenkins-ignore-noncode and squashes the following commits: 33786ac [Nicholas Chammas] break up long line 3f91a14 [Nicholas Chammas] rename array of source files 8b82a26 [Nicholas Chammas] [Spark QA] only check code files for new classes
can be run using: | ||
|
||
./dev/run-tests | ||
Please see the guidance on how to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's nice that we have the tl;dr
here on how to run tests. The wiki link might not be stable (we are linking to an anchor tab) and the content there is only one or two sentences... not sure it adds much
QA tests have finished for PR 2014 at commit
|
@andrewor14 @nchammas @pwendell Humble ping on this one, I think it's good to go, and probably helps head off some build questions going forward. |
The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables | ||
can be set to control the SBT build. For example: | ||
|
||
sbt -Pyarn -Phadoop-2.3 compile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add a bit more color here about how to use sbt
, to match what used to be in the GitHub README? Or is this sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goal here is just a taste, assuming the advanced developer will understand and figure out the rest if needed. Happy to make further edits though, like, should we still suggest ./sbt/sbt
instead of a local sbt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't know enough to make a recommendation; I'll leave that to others. Just wanted to call out the fact that we'd have less info on using sbt
than before. Maybe that's a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'd give sbt/sbt
but other than that, seems fine to have just a small example.
There really should be at least some mention of zinc (https://github.com/typesafehub/zinc) in our maven build instructions, since using zinc greatly improves the maven + scala experience. |
Yes, the build already warns if zinc is not being used. |
Yes, I know that the scala-maven-plugin will throw warnings if zinc isn't being used. I also know that many users are either confused by those warnings or ignore them completely, in large measure because our build docs never mention zinc even though zinc usage should be commonplace and arguably in the default "how to build" instruction. |
@pwendell I changed to |
QA tests have started for PR 2014 at commit
|
|
||
[Zinc](https://github.com/typesafehub/zinc) is a long-running server version of SBT's incremental | ||
compiler. When run locally as a background process, it speeds up builds of Scala-based projects | ||
like Spark. Developers who regularly recompile Spark will be most interested in Zinc. The project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be something like "...who regularly recompile Spark with Maven will be..."; else some may think that Zinc is something to be used with SBT -- especially given that this comment is placed immediately after "Building with SBT".
@markhamstra Nice one, change coming up... |
QA tests have started for PR 2014 at commit
|
QA tests have finished for PR 2014 at commit
|
QA tests have finished for PR 2014 at commit
|
markdown: kramdown | ||
gems: | ||
- jekyll-redirect-from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this config mean that users don't need to install them gem manually?
LGTM pending one minor comment. |
@srowen are you planning to add more to this or is it GTG from your perspective? |
@pwendell no I believe that the user still has to install the gem. I did at least. Yes this is GTG from my end. |
Okay I can merge this. One thing though, we've typically had less-than-smooth experiences with jekyll and its dependencies. So if this feature causes issues for users I'd propose we just maintain a tombstone and write one line of javascript to redirect. Google, Yahoo, Bing, etc can correctly index this behavior (I've done this a few times in the past to migrate from one site to another) and move to the new page. |
FYI: This page is 404-ing: http://spark.apache.org/docs/latest/building-spark.html Is that temporary? |
@nchammas that page won't appear until we actually update the live docs (something that happens for each release rather than when a push a PR) |
Here's my crack at Bertrand's suggestion. The Github
README.md
contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now.(Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.)