SPARK-2149. [MLLIB] Univariate kernel density estimation #1093

sryza · 2014-06-16T01:55:24Z

No description provided.

AmplabJenkins · 2014-06-16T01:59:43Z

Merged build triggered.

AmplabJenkins · 2014-06-16T01:59:50Z

Merged build started.

AmplabJenkins · 2014-06-16T02:42:22Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-16T02:42:22Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15809/

srowen · 2014-06-16T07:55:14Z

mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala

+import org.apache.spark.rdd.RDD
+import org.apache.commons.math3.util.FastMath
+
+object KernelDensity {


Is this a candidate for Experimental? in the sense that it might evolve into a fuller density estimation something-or-other later?

sryza · 2014-06-16T20:43:50Z

Thanks for the comments Sean. Updated patch checks for positive standard deviation, marks it as experimental, and tries to make the calculations a little more clear.

AmplabJenkins · 2014-06-16T20:44:45Z

Merged build triggered.

AmplabJenkins · 2014-06-16T20:44:55Z

Merged build started.

srowen · 2014-06-16T20:53:35Z

mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala

+    }
+
+    // This gets used in each Gaussian PDF computation, so compute it up front
+    val logStandardDeviationPlusHalfLog2Pi =


If some of this is copied from Commons Math I'd suggest a note about its origin. I like FastMath; I think they show it is faster than Java's version. For consistency in the past I either used all FastMath or all Math. I don't know how much it matters here, using FastMath vs Java Math vs Scala Math from a consistency standpoint?

AmplabJenkins · 2014-06-16T21:29:53Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-16T21:29:53Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15823/

pwendell · 2014-09-02T01:19:44Z

@sryza dunno if this is still something you want to submit, but if so can you tag this with MLLib? otherwise it doesn't get sorted correctly.

SparkQA · 2015-02-02T20:48:04Z

Test build #26534 has started for PR 1093 at commit 6c91645.

This patch merges cleanly.

SparkQA · 2015-02-02T21:29:17Z

Test build #26534 has finished for PR 1093 at commit 6c91645.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-02T21:29:21Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26534/
Test FAILed.

srowen · 2015-02-07T15:47:41Z

@sryza are you interested in getting this in? @mengxr what's your opinion of adding this? My last review comment is that we should just use Math instead of FastMath as the latter isn't used elsewhere in Spark. Unless there's a clear performance reason for doing so. In which case, hey, let's use the fast version everywhere.

mengxr · 2015-02-07T18:26:22Z

Does FastMath give significant performance improvement over math? I think this needs some performance testing, given that there are other overheads involved in the computation. If we don't see significant gain, maybe it is not worthing using it. People might have different versions of commons-math3 on the classpath, so we should try to use a minimal subset of its functions. From its doc (http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/util/FastMath.html), many functions are marked as "since 3.4". Another issue is the optimization used in FastMath. For example, this is the log implementation:

https://github.com/apache/commons-math/blob/master/src/main/java/org/apache/commons/math3/util/FastMath.java#L1141

It might be faster than JVM's implementation. But if anything (accuracy/performance) goes wrong there, it will be extremely hard for us the trace the problem.

About the API, is it okay to put kernelDensity as a method under Statistics and hide the implementation?

sryza · 2015-02-07T19:39:44Z

Sorry for the delay on this @mengxr @srowen. Updated patch to take out FastMath and expose the method in Statistics.

SparkQA · 2015-02-07T19:42:53Z

Test build #27012 has started for PR 1093 at commit 5f06b33.

This patch merges cleanly.

SparkQA · 2015-02-07T20:58:09Z

Test build #27012 has finished for PR 1093 at commit 5f06b33.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-07T20:58:13Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27012/
Test PASSed.

srowen · 2015-02-08T11:03:25Z

Looking OK to me. I'll wait a beat for @mengxr to add any final comments.

mengxr · 2015-02-09T18:22:58Z

mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala

+
+    // This gets used in each Gaussian PDF computation, so compute it up front
+    val logStandardDeviationPlusHalfLog2Pi =
+      Math.log(standardDeviation) + 0.5 * Math.log(2 * Math.PI)


Math is being deprecated. Please replace it with math instead.

Ah, yeah I thought of that before I merged, but saw a load of usages of Math in the code. Shall I make a PR to change all of them in one go?

…S, Spark Master and Spark Workers (apache#1093) Co-authored-by: Egor Krivokon <>

srowen reviewed Jun 16, 2014
View reviewed changes

sryza changed the title ~~SPARK-2149. Univariate kernel density estimation~~ SPARK-2149. [MLLIB] Univariate kernel density estimation Sep 2, 2014

sryza added 3 commits February 7, 2015 10:44

SPARK-2149. Univariate kernel density estimation

0dfa005

Respond to Sean's review comments

0f73060

More review comments

5f06b33

sryza force-pushed the sandy-spark-2149 branch from 6c91645 to 5f06b33 Compare February 7, 2015 19:38

asfgit closed this in 0793ee1 Feb 9, 2015

mengxr reviewed Feb 9, 2015
View reviewed changes

udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024

MapR [SPARK-1188] [Java 17] Need to open java.lang module for Spark H…

ca8d859

…S, Spark Master and Spark Workers (apache#1093) Co-authored-by: Egor Krivokon <>

mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025

MapR [SPARK-1188] [Java 17] Need to open java.lang module for Spark H…

349ca50

…S, Spark Master and Spark Workers (apache#1093) Co-authored-by: Egor Krivokon <>

SPARK-2149. [MLLIB] Univariate kernel density estimation #1093

SPARK-2149. [MLLIB] Univariate kernel density estimation #1093

Uh oh!

Conversation

sryza commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

srowen Jun 16, 2014

Choose a reason for hiding this comment

Uh oh!

sryza commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

srowen Jun 16, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

pwendell commented Sep 2, 2014

Uh oh!

SparkQA commented Feb 2, 2015

Uh oh!

SparkQA commented Feb 2, 2015

Uh oh!

AmplabJenkins commented Feb 2, 2015

Uh oh!

srowen commented Feb 7, 2015

Uh oh!

mengxr commented Feb 7, 2015

Uh oh!

sryza commented Feb 7, 2015

Uh oh!

SparkQA commented Feb 7, 2015

Uh oh!

SparkQA commented Feb 7, 2015

Uh oh!

AmplabJenkins commented Feb 7, 2015

Uh oh!

srowen commented Feb 8, 2015

Uh oh!

mengxr Feb 9, 2015

Choose a reason for hiding this comment

Uh oh!

srowen Feb 9, 2015

Choose a reason for hiding this comment

Uh oh!

Uh oh!