-
Notifications
You must be signed in to change notification settings - Fork 28.6k
SPARK-2149. [MLLIB] Univariate kernel density estimation #1093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
import org.apache.spark.rdd.RDD | ||
import org.apache.commons.math3.util.FastMath | ||
|
||
object KernelDensity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a candidate for Experimental? in the sense that it might evolve into a fuller density estimation something-or-other later?
Thanks for the comments Sean. Updated patch checks for positive standard deviation, marks it as experimental, and tries to make the calculations a little more clear. |
Merged build triggered. |
Merged build started. |
} | ||
|
||
// This gets used in each Gaussian PDF computation, so compute it up front | ||
val logStandardDeviationPlusHalfLog2Pi = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If some of this is copied from Commons Math I'd suggest a note about its origin. I like FastMath; I think they show it is faster than Java's version. For consistency in the past I either used all FastMath or all Math. I don't know how much it matters here, using FastMath vs Java Math vs Scala Math from a consistency standpoint?
Merged build finished. All automated tests passed. |
All automated tests passed. |
@sryza dunno if this is still something you want to submit, but if so can you tag this with |
Test build #26534 has started for PR 1093 at commit
|
Test build #26534 has finished for PR 1093 at commit
|
Test FAILed. |
@sryza are you interested in getting this in? @mengxr what's your opinion of adding this? My last review comment is that we should just use |
Does It might be faster than JVM's implementation. But if anything (accuracy/performance) goes wrong there, it will be extremely hard for us the trace the problem. About the API, is it okay to put |
6c91645
to
5f06b33
Compare
Test build #27012 has started for PR 1093 at commit
|
Test build #27012 has finished for PR 1093 at commit
|
Test PASSed. |
Looking OK to me. I'll wait a beat for @mengxr to add any final comments. |
|
||
// This gets used in each Gaussian PDF computation, so compute it up front | ||
val logStandardDeviationPlusHalfLog2Pi = | ||
Math.log(standardDeviation) + 0.5 * Math.log(2 * Math.PI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Math
is being deprecated. Please replace it with math
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah I thought of that before I merged, but saw a load of usages of Math
in the code. Shall I make a PR to change all of them in one go?
…S, Spark Master and Spark Workers (apache#1093) Co-authored-by: Egor Krivokon <>
…S, Spark Master and Spark Workers (apache#1093) Co-authored-by: Egor Krivokon <>
No description provided.