Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutable MomentsState, more efficient + for double added to Moments #1050

Merged
merged 11 commits into from
Jan 13, 2022

Conversation

sritchie
Copy link
Collaborator

This PR:

  • fixes an issue with build.sbt preventing my local jdk17 from compiling
  • adds + methods to Moments allowing addition with other Moments or doubles
  • adds MomentsState for efficient mutable accumulation of a Moments instance from iterables of either doubles or other moments instances

MomentsState pulls the former mutable implementation from sumOption and adds a second one for mutably adding in new doubles.

* Returns a [[Fold]] instance that uses `+` to accumulate deltas into this
* [[Moments]] instance.
*/
def fold: Fold[Double, Moments] =
Copy link
Collaborator Author

@sritchie sritchie Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the style we used in ExpHist, where the fold is an instance method that lets you fold into THIS Moments, implemented by

_ => Moments.MomentsState.fromMoments(this)

Let me know if you want to just have an empty fold on Moments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh... looking at it now, it seems a bit strange to me, but I can see the benefit of being able to start with a non empty value to resume some updates.

val delta_n = delta / countCombined
val delta_n2 = delta_n * delta_n
val delta_n3 = delta_n2 * delta_n
val count_sq = count * count
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also killed the math.pow calls everywhere.

@@ -111,6 +245,8 @@ object Moments {

val aggregator: MomentsAggregator.type = MomentsAggregator

val fold: Fold[Double, Moments] = momentsMonoid.zero.fold
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we go, the zero fold.

val iter = items.toIterator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implementation lives in MomentsState now.

@@ -100,7 +132,7 @@ class MomentsTest extends AnyWordSpec with Matchers {
* Given a list of doubles, create a Moments object to hold the list's central moments.
*/
def getMoments(xs: List[Double]): Moments =
MomentsAggregator(xs)
Moments.aggregator(xs)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched this so we get coverage on the aggregator line :)

@@ -69,7 +69,7 @@ val sharedSettings = Seq(
Nil
}
},
javacOptions ++= Seq("-target", "1.6", "-source", "1.6"),
javacOptions ++= Seq("-target", "1.8", "-source", "1.8"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the build change.

@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2022

Codecov Report

Merging #1050 (8232a9a) into develop (d96093b) will increase coverage by 0.00%.
The diff coverage is 87.01%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #1050   +/-   ##
========================================
  Coverage    89.71%   89.71%           
========================================
  Files          124      124           
  Lines         9914     9957   +43     
  Branches       497      501    +4     
========================================
+ Hits          8894     8933   +39     
- Misses        1020     1024    +4     
Impacted Files Coverage Δ
...main/scala/com/twitter/algebird/MomentsGroup.scala 81.33% <87.01%> (+1.89%) ⬆️
.../main/scala/com/twitter/algebird/Applicative.scala 51.72% <0.00%> (-3.45%) ⬇️
...src/main/scala/com/twitter/algebird/Interval.scala 84.34% <0.00%> (-0.87%) ⬇️
.../main/scala/com/twitter/algebird/BloomFilter.scala 95.13% <0.00%> (-0.45%) ⬇️
.../scala/com/twitter/algebird/immutable/BitSet.scala 97.21% <0.00%> (+0.42%) ⬆️
.../main/scala/com/twitter/algebird/Successible.scala 91.66% <0.00%> (+8.33%) ⬆️
...main/scala-2.12-/com/twitter/algebird/compat.scala 83.33% <0.00%> (+16.66%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d96093b...8232a9a. Read the comment docs.

* Returns a [[Fold]] instance that uses `+` to accumulate deltas into this
* [[Moments]] instance.
*/
def fold: Fold[Double, Moments] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh... looking at it now, it seems a bit strange to me, but I can see the benefit of being able to start with a non empty value to resume some updates.

* but we do it to avoid allocating a new Moments on every item in the
* loop. the Monoid laws test that sum matches looping on plus
*/
val countCombined = count + b.m0D
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could use a variant of Kahan summation on countCombined, right

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, and if we had the fold version, or KahanState, it would fit nicely here.

@sritchie
Copy link
Collaborator Author

Okay, fixed up @johnynek !

@sritchie
Copy link
Collaborator Author

sritchie commented Jan 13, 2022

Benchmark results... I'm a little surprised at the winner here.

[info] Benchmark                         (numElements)   Mode  Cnt    Score   Error  Units
[info] MomentsBenchmark.timeAggregate            10000  thrpt    3   7968.580 ±  625.872  ops/s
[info] MomentsBenchmark.timeFold                 10000  thrpt    3  12852.096 ±  760.259  ops/s
[info] MomentsBenchmark.timePlusDoubles          10000  thrpt    3  15150.855 ± 743.752  ops/s
[info] MomentsBenchmark.timePlusMoments          10000  thrpt    3  13478.708 ± 675.039  ops/s
[info] MomentsBenchmark.timeSumOption            10000  thrpt    3  13576.123 ± 1111.999  ops/s

Wow, and here are the benchmarks that could run BEFORE this PR and #1049. Optimizing plus really matters!!

[info] Benchmark                         (numElements)   Mode  Cnt    Score   Error  Units
[info] MomentsBenchmark.timeAggregate            10000  thrpt    3  227.477 ± 2.977  ops/s
[info] MomentsBenchmark.timePlusMoments          10000  thrpt    3  223.974 ± 7.554  ops/s
[info] MomentsBenchmark.timeSumOption            10000  thrpt    3  226.067 ± 3.714  ops/s

@sritchie sritchie merged commit a84caa0 into develop Jan 13, 2022
@sritchie sritchie deleted the sritchie/moments_update branch January 13, 2022 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants