-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-17455][MLlib] Improve PAVA implementation in IsotonicRegression #15018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #65117 has finished for PR 15018 at commit
|
Could someone please take a look at this? @mengxr maybe? Looks like you merged the initial implementation. |
Anyone? @srowen? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than anything it would be great to add a couple simple unit tests of this method to prove it still works in key cases.
|
||
// Find next monotonicity violating sequence, if any. | ||
while (j < n && input(j)._1 >= input(j + 1)._1) { | ||
j = j + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: j += 1
and likewise elsewhere.
var j = i | ||
|
||
// Find next monotonicity violating sequence, if any. | ||
while (j < n && input(j)._1 >= input(j + 1)._1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition changed to >= here on purpose right? you're now looking for a region that's non-increasing, not just strictly decreasing. Just checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be correct. Lemma 1 from "A Fast Scaling Algorithm for Minimizing Separable Convex Functions Subject to Chain Constraints"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original one i.e., input(j)._1 > input(j + 1)._1
is correct. Here it is going to select out-of-order blocks.
Quoted from the paper:
We refer to two blocks [p, q] and [q + 1, r] as consecutive. We refer to two consecutive blocks [p, q] and [q +1, r] as in-order if theta_pq <= theta_q+1, r and out-of-order otherwise.
LEMMA 1 is pointing the how a merged block is also a single-valued block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't actually seen that paper. I just worked from the scikit-learn implementation (which, incidentally, has changed since I submitted this PR).
I believe that either way will get you correct results. Doing >=
should be faster in some cases, because it's greedier about pooling, which should lead to fewer iterations of the outer loop.
// and check if pooling caused monotonicity violation in previously processed points. | ||
while (i >= 0 && input(i)._1 > input(i + 1)._1) { | ||
// Pool the violating sequence with the data point preceding it | ||
if (input(i)._1 != input(j)._1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is "if there was any decrease, not just == from i to j" right? and pool
makes everything in [i,j] non-violating.
Comments from the previous code note that this could introduce a violation -- is this still possible? it wasn't obvious why it can't happen now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the logic is:
- If equality holds here, then either (a) i = j or (b) the block [i,j] has equal elements.
- A violation could be introduced but will be caught on the next iteration of the outer loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Any violations that are introduced will be fixed in the next iteration.
Thanks for pinging about this---it looks important! One request: Do you know if there is a resource with a clearer statement of this precise algorithm for the L2 norm? That'd be nice to have. The best I found so far was "A Fast Scaling Algorithm for Minimizing Separable Convex Functions Subject to Chain Constraints" |
// Otherwise pool the violating sequence | ||
// and check if pooling caused monotonicity violation in previously processed points. | ||
while (i >= 0 && input(i)._1 > input(i + 1)._1) { | ||
// Pool the violating sequence with the data point preceding it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment correct? Looks like you pool the violating sequence [i, j] only. The preceding data points are checked for pooling in next outer loop.
5608ae1
to
8d16043
Compare
Given that the scikit-learn implementation I based this on has changed (scikit-learn/scikit-learn#7444) since I submitted the PR, and @jkbradley has pointed out a reference I hadn't seen, I'd like to take a little bit of time to see if there's yet a better algorithm. This one is an improvement over what we had, but as long as we're changing it, we may as well get it right. |
Test build #70134 has finished for PR 15018 at commit
|
Test build #70136 has finished for PR 15018 at commit
|
Looks like the new scikit-learn implementation suffers from a similar problem to the one that the original Spark implementation had. I left them a note pointing it out. Right now I'm working on implementing what's in "Minimizing Separable Convex Functions Subject to Simple Chain Constraints", by Best, Chakravarti, and Ubhaya. Running down the referecence trail, it looks like the original algorithm is in "Projections onto Order Simplexes" by Grotzinger and Witzgall, and this particular formulation of it is originally in "Active set algorithms for isotonic regression; A unifying framework" by Best and Chakravarti. |
If it's correct and faster, I tend to favor merging this. The code looks OK to my understanding and passes existing tests. |
Found another input that triggers non-polynomial time with the code in this PR. I'm again borrowing from scikit-learn. I think this is the case they found that led them to re-write their implementation.
I'm now working on implementing what's described in the Best papers. This should give O(n), even in the worst case. Should I close this and open a new PR with the new algorithm, or just add it here and you can squash when you merge? |
Just add commits here. |
The isotonic regression problem doesn't even have a unique solution when there are 0 weights.
This version closely follows the algorithm in referenced literature and should guarantee O(n) complexity, even in the worst case.
Test build #70214 has finished for PR 15018 at commit
|
Alright, the PAV algorithm has been completely re-written to follow what's outlined in "Minimizing Separable Convex Functions Subject to Simple Chain Constraints". I've tested it with a bunch of different inputs that caused previous version of this algorithm to go non-polynomial. It stays linear for all of them. I will note that it's slightly slower on very small datasets (< 5000 points or so), but those still finish in less than a millisecond on my laptop, so I'm not too concerned. The only caveat is that there was one test that I couldn't get passing, so I removed it. It involved passing input data with 0 weights. I'd argue that the isotonic regression problem doesn't even have a unique solution in that case, so we shouldn't support it. Still, it is a slight change in behavior. Looks like Jenkins is pointing out some style issues, so I'll get to work on those. |
Test build #70216 has finished for PR 15018 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo a few minor comments, the code at least looks clean. If the tests pass, seems OK.
gives the values for the block. Entries that are not at the start of a block | ||
are meaningless. | ||
*/ | ||
val weights: Array[(Double, Double)] = input.map(x => (x._3, x._3 * x._1)) // (weight, weight * y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you; might be clearer as input.map { case (y, _, weight) => (weight, weight * y) }
} | ||
/* | ||
Keeps track of the start and end indices of the blocks. blockBounds(start) gives the | ||
index of the end of the block and blockBounds(end) gives the index of the start of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blockBounds(start) is the end of a block? this is reflected in your helper functions below but it seems reversed. Can you double check and elaborate a bit?
while (nextBlock(i) < input.length) { | ||
if (average(i) >= average(nextBlock(i))) { | ||
merge(i, nextBlock(i)) | ||
while((i > 0) && (average(prevBlock(i)) >= average(i))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit: space before while
here and below. You can probably nix the extra parens around each of the two terms, but no big deal
Test build #70363 has finished for PR 15018 at commit
|
8762531
to
d19647b
Compare
Test build #70364 has finished for PR 15018 at commit
|
@srowen Addressed your comments and fixed some style issues. Some updated timings: Alternating decreasing input
Before:
After:
Pathological input from scikit-learn benchmarks:
Before:
After:
|
For the zero-weight values, can we do similar to scikit-learn to remove zero-weight values, like amueller/scikit-learn@2415100 |
Test build #70886 has finished for PR 15018 at commit
|
// gives the values for the block. Entries that are not at the start of a block | ||
// are meaningless. | ||
val weights: Array[(Double, Double)] = input.map { case (y, _, weight) => | ||
require(weight != 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are negative weights OK? (you don't need a type on weight
, but hardly matters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a weird thing to do, but I don't think there's any reason we can't support it. The problem still has a unique solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, seems reasonable to me. Any other thoughts @viirya ?
SGTM. See if @jkbradley has more comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pinging me. The updated implementation looks nice, but I do have comments about weights.
I'd like to disallow negative weights. It's quite safe to assume that weights are non-negative in ML, and I've never seen negative weights used in isotonic regression.
- Q: I saw mention of negative weights above, but the Best and Chakravarti (1990) paper referenced explicitly requires strictly positive weights. Does one of the other papers talk about negative weights?
- Btw, with negative weights, it would be easy to have undefined solutions. Imagine the first element in the input having the smallest value and a negative weight. A "correct" output would be infinite for that value and 0 elsewhere, giving a minimum objective value of NegativeInfinity.
I'd also prefer to allow zero weights to avoid breaking the API and since the model is fairly well-defined even with zero weights, except for edge cases at extreme feature values. Removing the zero-weight rows should provide reasonable solutions, though we'd need to assert that the whole dataset had non-zero weight.
How does this sound?
// Merge two adjacent blocks, updating blockBounds and weights to reflect the merge | ||
// Return the start index of the merged block | ||
def merge(block1: Int, block2: Int): Int = { | ||
assert(blockEnd(block1) + 1 == block2, "attempting to merge non-consecutive blocks") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the error message more informative:
- Make it clear that this indicates an internal bug within IsotonicRegression
- State the invalid values
// gives the values for the block. Entries that are not at the start of a block | ||
// are meaningless. | ||
val weights: Array[(Double, Double)] = input.map { case (y, _, weight) => | ||
require(weight != 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an informative error message.
Your changes sound fine. I don't have strong feelings one way or another, although I think we should at least throw a warning if we're discarding points. I do want to point out that disallowing negative weights also breaks the API, since they were previously supported (there's even a test for them). |
Test build #71087 has finished for PR 15018 at commit
|
e33a308
to
83480e7
Compare
Test build #71092 has finished for PR 15018 at commit
|
* | ||
* [3] Best, Michael J., Nilotpal Chakravarti, and Vasant A. Ubhaya. "Minimizing separable convex | ||
* functions subject to simple chain constraints." SIAM Journal on Optimization 10.3 (2000): | ||
* 658-672. | ||
* | ||
* @param input Input data of tuples (label, feature, weight). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the param doc to indicate that now negative weights are prohibited?
LGTM |
Test build #71271 has finished for PR 15018 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, other than checking with @mengxr about negative weights.
val cleanInput = input.flatMap{ case (y, x, weight) => | ||
require(weight >= 0.0, "weights must be non-negative") | ||
if (weight == 0.0) { | ||
logWarning(s"Dropping zero-weight point ($y, $x, $weight)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree about logging for zero-weight points. If an instance has zero weight, then it should be ignored, and that's not a problem.
If we really want to log this, then let's do it once per dataset, not once per row, which could make logs explode. I'd also say we should lower the logging level to logInfo.
if (input.isEmpty) { | ||
return Array.empty | ||
val cleanInput = input.flatMap{ case (y, x, weight) => | ||
require(weight >= 0.0, "weights must be non-negative") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State invalid y,x,weight.
* @return Result tuples (label, feature, weight) where labels were updated | ||
* to form a monotone sequence as per isotonic regression definition. | ||
*/ | ||
private def poolAdjacentViolators( | ||
input: Array[(Double, Double, Double)]): Array[(Double, Double, Double)] = { | ||
|
||
if (input.isEmpty) { | ||
return Array.empty | ||
val cleanInput = input.flatMap{ case (y, x, weight) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter would be simpler than flatMap
assert( | ||
blockEnd(block1) + 1 == block2, | ||
s"Attempting to merge non-consecutive blocks [${block1}, ${blockEnd(block1)}]" + | ||
s" and [${block2}, ${blockEnd(block2)}. This is likely a bug in the isotonic regression" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing "]"
@zapletal-martin Pinging since you wrote the original PR: There's discussion here about whether IsotonicRegression should support negative weights. Is there a good reason to? I haven't seen papers which allow negative weights, though I've only looked at a few. Thanks! |
My thought was that there's nothing in the computation that prevents it (unlike 0 weights, which cause NaNs). Why not err on the side of maximum flexibility to the end user? Just because there's no use case for it now doesn't mean someone won't find one in the future. Just my 2 cents. I don't feel strongly about it and will defer to commiters. |
Test build #71329 has started for PR 15018 at commit |
Looks like the build completed successfully. Somehow Jenkins didn't post the message. |
Thanks for the updates! This LGTM, except for deciding about negative weights. Responding to your comment above, negative weights are just as problematic as 0 weights. See my comment above "Btw, with negative weights, ..." I spoke with @mengxr offline, and he does not recall why the original work supported negative weights. I'll wait a few days before merging this to see if @zapletal-martin can check out the issue. |
@jkbradley I'll merge if there are no other thoughts here. Seems OK to me too |
Sounds good. I'll run fresh tests before merging to be safe though. |
Test build #3547 has finished for PR 15018 at commit
|
Merging with master |
## What changes were proposed in this pull request? New implementation of the Pool Adjacent Violators Algorithm (PAVA) in mllib.IsotonicRegression, which used under the hood by ml.regression.IsotonicRegression. The previous implementation could have factorial complexity in the worst case. This implementation, which closely follows those in scikit-learn and the R `iso` package, runs in quadratic time in the worst case. ## How was this patch tested? Existing unit tests in both `mllib` and `ml` passed before and after this patch. Scaling properties were tested by running the `poolAdjacentViolators` method in [scala-benchmarking-template](https://github.com/sirthias/scala-benchmarking-template) with the input generated by ``` scala val x = (1 to length).toArray.map(_.toDouble) val y = x.reverse.zipWithIndex.map{ case (yi, i) => if (i % 2 == 1) yi - 1.5 else yi} val w = Array.fill(length)(1d) val input: Array[(Double, Double, Double)] = (y zip x zip w) map{ case ((y, x), w) => (y, x, w)} ``` Before this patch: | Input Length | Time (us) | | --: | --: | | 100 | 1.35 | | 200 | 3.14 | | 400 | 116.10 | | 800 | 2134225.90 | After this patch: | Input Length | Time (us) | | --: | --: | | 100 | 1.25 | | 200 | 2.53 | | 400 | 5.86 | | 800 | 10.55 | Benchmarking was also performed with randomly-generated y values, with similar results. Author: z001qdp <Nicholas.Eggert@target.com> Closes apache#15018 from neggert/SPARK-17455-isoreg-algo.
## What changes were proposed in this pull request? New implementation of the Pool Adjacent Violators Algorithm (PAVA) in mllib.IsotonicRegression, which used under the hood by ml.regression.IsotonicRegression. The previous implementation could have factorial complexity in the worst case. This implementation, which closely follows those in scikit-learn and the R `iso` package, runs in quadratic time in the worst case. ## How was this patch tested? Existing unit tests in both `mllib` and `ml` passed before and after this patch. Scaling properties were tested by running the `poolAdjacentViolators` method in [scala-benchmarking-template](https://github.com/sirthias/scala-benchmarking-template) with the input generated by ``` scala val x = (1 to length).toArray.map(_.toDouble) val y = x.reverse.zipWithIndex.map{ case (yi, i) => if (i % 2 == 1) yi - 1.5 else yi} val w = Array.fill(length)(1d) val input: Array[(Double, Double, Double)] = (y zip x zip w) map{ case ((y, x), w) => (y, x, w)} ``` Before this patch: | Input Length | Time (us) | | --: | --: | | 100 | 1.35 | | 200 | 3.14 | | 400 | 116.10 | | 800 | 2134225.90 | After this patch: | Input Length | Time (us) | | --: | --: | | 100 | 1.25 | | 200 | 2.53 | | 400 | 5.86 | | 800 | 10.55 | Benchmarking was also performed with randomly-generated y values, with similar results. Author: z001qdp <Nicholas.Eggert@target.com> Closes apache#15018 from neggert/SPARK-17455-isoreg-algo.
What changes were proposed in this pull request?
New implementation of the Pool Adjacent Violators Algorithm (PAVA) in mllib.IsotonicRegression, which used under the hood by ml.regression.IsotonicRegression. The previous implementation could have factorial complexity in the worst case. This implementation, which closely follows those in scikit-learn and the R
iso
package, runs in quadratic time in the worst case.How was this patch tested?
Existing unit tests in both
mllib
andml
passed before and after this patch. Scaling properties were tested by running thepoolAdjacentViolators
method in scala-benchmarking-template with the input generated byBefore this patch:
After this patch:
Benchmarking was also performed with randomly-generated y values, with similar results.