Skip to content

Remove hardcoded Spark version string to use SBT defined version instead #600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Conversation

pierre-borckmans
Copy link

This PR attempts to remove hardcoded references to the Spark version which must be updated for each release.

This PR uses the version defined in the main sbt definition file (project/SparkBuild.scala - SPARK_VERSION). When the sbt compile/package/assembly is triggered, this string is used as the version in JAR manifest.

We then use getClass.getPackage.getImplementationVersion to refer to this string.

As a demonstration of potential use cases, the master ui page is modified to display the Spark version.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -1253,7 +1253,7 @@ class SparkContext(config: SparkConf) extends Logging {
*/
object SparkContext extends Logging {

private[spark] val SPARK_VERSION = "1.0.0"
private[spark] val SPARK_VERSION = getClass.getPackage.getImplementationVersion
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be be:

private[spark] val SPARK_VERSION = Utils.getSparkVersion

instead, I forgot to update it.

@aarondav
Copy link
Contributor

Jenkins, test this please.

@@ -1037,6 +1037,11 @@ private[spark] object Utils extends Logging {
obj.getClass.getSimpleName.replace("$", "")
}

/** Return the current version of Spark */
def getSparkVersion(): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you intend it to be called without parentheses, it should also be defined without them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i'll fix this

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@@ -1037,6 +1037,11 @@ private[spark] object Utils extends Logging {
obj.getClass.getSimpleName.replace("$", "")
}

/** Return the current version of Spark */
def getSparkVersion(): String = {
getClass.getPackage.getImplementationVersion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also work in the maven build?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not check as I was not really aware of the Maven build.
Apparently, I always overlooked this page:
http://spark.apache.org/docs/latest/building-with-maven.html

I'll check this now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work as well.
In fact, the build system does not matter:
http://docs.oracle.com/javase/7/docs/api/java/lang/Package.html
The version string is retrieved from the JAR manifest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this manifest property must be correctly set from the build system, but it's good to know if both systems do so.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.
I confirm that both sbt assembly and mvn package produce a fat jar in which the manifest contains the version.

unzip -p assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.0.0-mr1-cdh4.4.0.jar META-INF/MANIFEST.MF

outputs:

Implementation-Vendor: The Apache Software Foundation
Implementation-Title: Spark Project Core
Implementation-Version: 1.0.0-SNAPSHOT
Implementation-Vendor-Id: org.apache.spark
Built-By: ria
Build-Jdk: 1.6.0_27
Specification-Vendor: The Apache Software Foundation
Specification-Title: Spark Project Core
Created-By: Apache Maven 3.0.4
Specification-Version: 1.0.0-SNAPSHOT
Archiver-Version: Plexus Archiver

in both cases

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14588/

@aarondav
Copy link
Contributor

Unrelated test error, jenkins retest this.

@pwendell
Copy link
Contributor

@pierre-borckmans @aarondav what happens if someone is just running compiled spark classes (i.e. they aren't running Spark in an environment where it's been bundled as a jar). Also, what about if someone else has embedded Spark code inside of their assembly jar, similar to how we do it with our dependencies?

Will this still work?

@pwendell
Copy link
Contributor

Basically my question is whether this is embedded somewhere in the jar packaging (e.g. META-INF) or if it's actually in the byte code of each spark class. If it's the former, I don't see how it will work in the two cases I mentioned before.

@pwendell
Copy link
Contributor

(sorry for the comment bomb). Okay I looked a bit more... it seems like this relies on the jar manifest. While it's definitely nice to not hard-code the version, I see some potential concerns:

  1. Someone is running spark without a jar file (e.g. our own unit tests, when we run spark locally with assemble-deps, etc).
  2. Someone has embedded spark in their own assembly jar, in which case the manifest is not going to inherit from Spark.

In case 2 I think this might silently give the wrong version, i.e. it could just learn the version from whoever built the assembly.

So it might be worth it to just keep this hard-coded for now... we have dependencies that rely on META-INF configurations (e.g. datanucleus) and it's pretty annoying.

@pierre-borckmans
Copy link
Author

@pwendell AFAIK it's only made available in the META-INF of the fat jar. So you are right.
Sorry about that, I overlooked these 2 scenarios.

Another (maybe not so elegant) option is to have the version defined in a resource file.

@pwendell
Copy link
Contributor

@pierre-borckmans ah okay makes sense. I think in this case hard coding might be the best of (admittedly not great) alternatives. No problem on overlooking those cases, they are pretty non-obvious!

@pierre-borckmans
Copy link
Author

A last idea.
How about this sbt plugin?
https://github.com/sbt/sbt-buildinfo

It could definitely do the trick.

We would still need sth equivalent for maven.

Message sent from a mobile service - excuse typos and abbreviations

Le 30 avr. 2014 à 21:58, Patrick Wendell notifications@github.com a écrit :

@pierre-borckmans ah okay makes sense. I think in this case hard coding might be the best of (admittedly not great) alternatives. No problem on overlooking those cases, they are pretty non-obvious!


Reply to this email directly or view it on GitHub.

@mridulm
Copy link
Contributor

mridulm commented May 3, 2014

Please ensure it works in both sbt and maven case.

@pwendell
Copy link
Contributor

pwendell commented May 3, 2014

@pierre-borckmans Appreciate you looking into this, but I really don't think it's worth doing anything complex here - we can just leave it hard coded.

We already have to change a few things each time we release Spark... this really isn't a big concern for us. And bringing in build plug-ins etc is just an over-engineered solution here.

@pwendell
Copy link
Contributor

I think we should close this issue, it's gone stale

@asfgit asfgit closed this in 87738bf Aug 2, 2014
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
There is a case that a network contains more than one subnet, in such
case, we should select one from the list.

Related-Bug: #theopenlab/openlab#321
HyukjinKwon pushed a commit that referenced this pull request Jun 8, 2022
### What changes were proposed in this pull request?
This pr aims upgrade scala-maven-plugin to 4.6.2

### Why are the changes needed?
This version brings some bug fix related to `Incremental compile`, although it seems that Spark has not encountered these issue:

- [Fix incremental compiler not being able to find JDK classes when compiler macros with Java 11, close #502](davidB/scala-maven-plugin#608)
- [Fix incremental compilation on Java 11+, close #600](davidB/scala-maven-plugin#609)

all changes as follows:

- davidB/scala-maven-plugin@4.6.1...4.6.2

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

Closes #36800 from LuciferYang/scala-maven-plugin-462.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants