Skip to content

Breaking changes: increase performance#5

Open
JayKickliter wants to merge 4 commits into
avinashshenoy97:masterfrom
JayKickliter:jsk/breaking/reduce-heap-allocations
Open

Breaking changes: increase performance#5
JayKickliter wants to merge 4 commits into
avinashshenoy97:masterfrom
JayKickliter:jsk/breaking/reduce-heap-allocations

Conversation

@JayKickliter

@JayKickliter JayKickliter commented Oct 14, 2020

Copy link
Copy Markdown
Contributor

This PR significantly increases training and winner-lookup speed. The speedup is primarily achieved by reducing heap allocations. However, the cost is that it introduces API-breaking changes.

The methodology was first to add benchmarking to the current code base and measuring performance deltas after every little tweak to the code. In some cases, removing intermediate heap allocations led to performance regressions. In those cases, I left comments explaining why they're necessary.

Because this PR introduces breaking changes, I added breaking change: the non-default feature serde-1. This helps with build times for people not interested in serialization. Building with serde-1 enables this crate's old [to, from]_json support. I believe that those functions are out of this crate's scope, but as long as they are disabled by default, I see no harm.

Benchmarks

I first ran cargo bench without any library modifications, and the output below is after rerunning it on the tip of this branch.

Training/Random/10      time:   [68.191 us 68.893 us 69.746 us]
                        thr8 Kelem/s 145.15 Kelem/s 146.65 Kelem/s]
                 change:
                        time:   [-4.4626% -3.2623% -2.0623%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1057% +3.3724% +4.6710%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
Training/Batch/10       time:   [59.171 us 59.415 us 59.682 us]
                        thrpt:  [167.55 Kelem/s 168.31 Kelem/s 169.00 Kelem/s]
                 change:
                        time:   [-19.837% -18.782% -17.789%] (p = 0.00 < 0.05)
                        thrpt:  [+21.639% +23.126% +24.745%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Training/Random/100     time:   [666.92 us 671.67 us 678.39 us]
                        thrpt:  [147.41 Kelem/s 148.88 Kelem/s 149.94 Kelem/s]
                 change:
                        time:   [-6.2380% -4.6824% -2.8067%] (p = 0.00 < 0.05)
                        thrpt:  [+2.8878% +4.9124% +6.6531%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe
Training/Batch/100      time:   [605.23 us 607.84 us 611.17 us]
                        thrpt:  [163.62 Kelem/s 164.52 Kelem/s 165.23 Kelem/s]
                 change:
                        time:   [-17.402% -16.414% -15.498%] (p = 0.00 < 0.05)
                        thrpt:  [+18.340% +19.637% +21.068%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe
Training/Random/1000    time:   [6.7615 ms 6.7930 ms 6.8285 ms]
                        thrpt:  [146.45 Kelem/s 147.21 Kelem/s 147.90 Kelem/s]
                 change:
                        time:   [-3.2269% -2.6307% -2.0150%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0564% +2.7017% +3.3345%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Training/Batch/1000     time:   [6.0276 ms 6.0493 ms 6.0753 ms]
                        thrpt:  [164.60 Kelem/s 165.31 Kelem/s 165.90 Kelem/s]
                 change:
                        time:   [-14.930% -14.487% -14.008%] (p = 0.00 < 0.05)
                        thrpt:  [+16.290% +16.942% +17.550%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Winner/Plain/4          time:   [1.0489 us 1.0518 us 1.0548 us]
                        change: [-45.352% -44.837% -44.342%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Winner/Distance/4       time:   [1.0665 us 1.0722 us 1.0785 us]
                        change: [-48.275% -47.959% -47.654%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Notes

I'm fairly certain that the lackluster speedup of random training is explained in this comment.

@JayKickliter

Copy link
Copy Markdown
Contributor Author

Note: please don't merge this until someone with domain expertise checks that train_random is returning sensible results. I was able to add a test with reference output for train_batch to make sure I didn't break anything, but I don't know how to test random training.

@JayKickliter JayKickliter force-pushed the jsk/breaking/reduce-heap-allocations branch from b7f59a1 to 232f479 Compare October 14, 2020 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant