Skip to content

SPARK-2149. [MLLIB] Univariate kernel density estimation #1093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

sryza
Copy link
Contributor

@sryza sryza commented Jun 16, 2014

No description provided.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15809/

import org.apache.spark.rdd.RDD
import org.apache.commons.math3.util.FastMath

object KernelDensity {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a candidate for Experimental? in the sense that it might evolve into a fuller density estimation something-or-other later?

@sryza
Copy link
Contributor Author

sryza commented Jun 16, 2014

Thanks for the comments Sean. Updated patch checks for positive standard deviation, marks it as experimental, and tries to make the calculations a little more clear.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

}

// This gets used in each Gaussian PDF computation, so compute it up front
val logStandardDeviationPlusHalfLog2Pi =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some of this is copied from Commons Math I'd suggest a note about its origin. I like FastMath; I think they show it is faster than Java's version. For consistency in the past I either used all FastMath or all Math. I don't know how much it matters here, using FastMath vs Java Math vs Scala Math from a consistency standpoint?

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15823/

@pwendell
Copy link
Contributor

pwendell commented Sep 2, 2014

@sryza dunno if this is still something you want to submit, but if so can you tag this with MLLib? otherwise it doesn't get sorted correctly.

@sryza sryza changed the title SPARK-2149. Univariate kernel density estimation SPARK-2149. [MLLIB] Univariate kernel density estimation Sep 2, 2014
@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26534 has started for PR 1093 at commit 6c91645.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26534 has finished for PR 1093 at commit 6c91645.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26534/
Test FAILed.

@srowen
Copy link
Member

srowen commented Feb 7, 2015

@sryza are you interested in getting this in? @mengxr what's your opinion of adding this? My last review comment is that we should just use Math instead of FastMath as the latter isn't used elsewhere in Spark. Unless there's a clear performance reason for doing so. In which case, hey, let's use the fast version everywhere.

@mengxr
Copy link
Contributor

mengxr commented Feb 7, 2015

Does FastMath give significant performance improvement over math? I think this needs some performance testing, given that there are other overheads involved in the computation. If we don't see significant gain, maybe it is not worthing using it. People might have different versions of commons-math3 on the classpath, so we should try to use a minimal subset of its functions. From its doc (http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/util/FastMath.html), many functions are marked as "since 3.4". Another issue is the optimization used in FastMath. For example, this is the log implementation:

https://github.com/apache/commons-math/blob/master/src/main/java/org/apache/commons/math3/util/FastMath.java#L1141

It might be faster than JVM's implementation. But if anything (accuracy/performance) goes wrong there, it will be extremely hard for us the trace the problem.

About the API, is it okay to put kernelDensity as a method under Statistics and hide the implementation?

@sryza
Copy link
Contributor Author

sryza commented Feb 7, 2015

Sorry for the delay on this @mengxr @srowen. Updated patch to take out FastMath and expose the method in Statistics.

@SparkQA
Copy link

SparkQA commented Feb 7, 2015

Test build #27012 has started for PR 1093 at commit 5f06b33.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 7, 2015

Test build #27012 has finished for PR 1093 at commit 5f06b33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27012/
Test PASSed.

@srowen
Copy link
Member

srowen commented Feb 8, 2015

Looking OK to me. I'll wait a beat for @mengxr to add any final comments.

@asfgit asfgit closed this in 0793ee1 Feb 9, 2015

// This gets used in each Gaussian PDF computation, so compute it up front
val logStandardDeviationPlusHalfLog2Pi =
Math.log(standardDeviation) + 0.5 * Math.log(2 * Math.PI)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math is being deprecated. Please replace it with math instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah I thought of that before I merged, but saw a load of usages of Math in the code. Shall I make a PR to change all of them in one go?

udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
…S, Spark Master and Spark Workers (apache#1093)

Co-authored-by: Egor Krivokon <>
mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025
…S, Spark Master and Spark Workers (apache#1093)

Co-authored-by: Egor Krivokon <>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants