Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Favor SmallVec over Box in pluralrules #285

Closed
wants to merge 1 commit into from

Conversation

gregtatum
Copy link
Member

@gregtatum gregtatum commented Oct 1, 2020

This will help avoid heap allocations as most lists are only a few items
long. The only boxed slice that was retained was the SampleRanges, as
this can be of an indeterminate length.

Resolves #193
Depends on #140
Depends on #107

@@ -17,6 +17,7 @@ include = [
[dependencies]
icu-locale = { path = "../locale" }
icu-data-provider = { path = "../data-provider" }
smallvec = "1.4"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same version as num-util is using.

@coveralls
Copy link

coveralls commented Oct 1, 2020

Pull Request Test Coverage Report for Build 073a029aff20c87865f0223c207d4a4e19125dac-PR-285

  • 7 of 10 (70.0%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+11.2%) to 77.341%

Changes Missing Coverage Covered Lines Changed/Added Lines %
components/pluralrules/src/rules/ast.rs 0 3 0.0%
Files with Coverage Reduction New Missed Lines %
components/locale/src/serde/langid.rs 1 88.24%
Totals Coverage Status
Change from base Build c08d490c92c0187ee5504b8505d5fac7387d8675: 11.2%
Covered Lines: 2990
Relevant Lines: 3866

💛 - Coveralls

@gregtatum
Copy link
Member Author

It seems that the linter is not liking what this does to the RulesSelector, as it's now 2480 bytes for the Conditions(PluralRuleList) entry. One mitigation would be to heap allocate the PluralRuleList in a Box<PluralRuleList>. It's a heap allocation, but it might have better memory locality, although... that's just a guess.

@sffc
Copy link
Member

sffc commented Oct 1, 2020

Thanks for the contribution!

Can you provide figures on how "cargo bench" differs before and after this PR?

Boxing the PluralRuleList seems like a good idea. It's one heap allocation, but that's better than the dozens that were there before.

@zbraniecki
Copy link
Member

It does seem to have a nice impact on the microbenchmark:

plurals/parser/parse    time:   [1.1205 us 1.1219 us 1.1233 us]                                  
                        change: [-27.377% -26.828% -26.328%] (p = 0.00 < 0.05)
                        Performance has improved.

@zbraniecki
Copy link
Member

In terms of long term maintenance, there's an effort now to introduce stack based vector - rust-lang/rfcs#2990 which, if successful, is a stepping stone to a hybrid vector like smallvec.

So I'm cautiously optimistic that we can use smallvec in performance sensitive cases where we commonly have small vectors but want to be ready for them to be larger (like this) and eventually we'll be able to remove the dependency on smallvec.

@zbraniecki
Copy link
Member

I rebased @gregtatum patch on top of master and checked it against the total benchmarks:

operands/total          time:   [965.73 ns 971.94 ns 981.35 ns]                            
                        change: [-5.7538% -3.2698% -1.1855%] (p = 0.00 < 0.05)
                        Performance has improved.

parser/total            time:   [1.1470 us 1.1600 us 1.1776 us]                          
                        change: [-37.270% -36.208% -35.143%] (p = 0.00 < 0.05)
                        Performance has improved.

pluralrules/total       time:   [85.550 us 85.883 us 86.292 us]                              
                        change: [+4.7499% +6.1966% +7.6202%] (p = 0.00 < 0.05)
                        Performance has regressed.

I was able to reproduce that change (improvement on parsing by over 30% and regression on full selection by ~5-10%) in 5 test runs.
I do trust the results to be reliable.

I dove deeper:

zbraniecki@zibi-xps13:~/projects/icu4x/components/pluralrules$ cargo criterion pluralrules --features bench-pluralrules
   Compiling icu-num-util v0.0.1 (/home/zbraniecki/projects/icu4x/components/num-util)
   Compiling icu-pluralrules v0.0.1 (/home/zbraniecki/projects/icu4x/components/pluralrules)
    Finished bench [optimized] target(s) in 7.11s

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Gnuplot not found, using plotters backend
pluralrules/construct/fs                                                                            
                        time:   [143.34 us 143.71 us 144.19 us]
                        change: [+5.2929% +6.2346% +7.1295%] (p = 0.00 < 0.05)
                        Performance has regressed.

pluralrules/select/fs   time:   [1.7583 us 1.7621 us 1.7660 us]                                   
                        change: [-0.1753% +0.9524% +2.0192%] (p = 0.09 > 0.05)
                        No change in performance detected.

Gnuplot not found, using plotters backend
pluralrules/total       time:   [85.585 us 86.826 us 91.162 us]                              
                        change: [+11.223% +10.677% +14.487%] (p = 0.00 < 0.05)
                        Performance has regressed.

I'm not sure what's going on here really. My first hypothesis was that the regression is in selection because accessing AST nodes is slower, but the deep benchmarks show clearly that the regression is on parsing, not selection.

@zbraniecki
Copy link
Member

can someone reproduce?

@zbraniecki
Copy link
Member

One hypothesis that could explain that is that my parsing/total benchmark only tests pl rules and is not representative of what pluralrules/construct and pluralrules/total are testing since they all construct for 10 langs: https://github.com/unicode-org/icu4x/blob/master/components/pluralrules/benches/fixtures/plurals.json#L10

We may want to get rid of the pl rules in benchmark fixtures, and just parse rules for those 10 locales as a more representative case.

@zbraniecki
Copy link
Member

If that hypothesis holds, then this PR helps Polish rules (by 30%) but slows down rules for 10 locales (by 5-10%). Both are syntentic microbenchmarks, but that would make me be against landing this change.

@zbraniecki
Copy link
Member

Ok, I'm fairly certain that this is the case. The path forward is for #227 to be resolved and then all plural rules tests will use the generated data for the 10+/- locales we select for all benchmarks.

For this PR, I suggest @gregtatum that you test against the plural rules subset or, if you want to help clean up benchmarks, take on the PR to move parser/total to parse conditions for 10 locales that are used everywhere else. This will give us more consistent results.

@zbraniecki
Copy link
Member

My PR in #295 is coalescing benchmarks to be more reliable by testing against 10 locales, instead of just one. Shane in #296 introduces "testdata" which will give us a single source of those locales and data to be used in all components tests/examples/benchmarks.

This should help us reason about this PR, so I'd suggest now to wait for those two to land.

@gregtatum
Copy link
Member Author

It seems to reproduce on my machine (macOS).

plurals/parser/parse    time:   [1.5964 us 1.6061 us 1.6164 us]
                        change: [-59.684% -59.233% -58.768%] (p = 0.00 < 0.05)
                        Performance has improved.

plurals/convert+select/fs
                        time:   [1.4706 us 1.4817 us 1.4945 us]
                        change: [+7.0000% +8.2399% +9.8204%] (p = 0.00 < 0.05)
                        Performance has regressed.

plurals/select/fs       time:   [1.4470 us 1.4546 us 1.4636 us]
                        change: [+6.6509% +8.0535% +9.3887%] (p = 0.00 < 0.05)
                        Performance has regressed.

plurals/construct/fs/10 time:   [379.03 us 381.91 us 385.46 us]
                        change: [+1.8176% +2.8942% +4.1765%] (p = 0.00 < 0.05)
                        Performance has regressed.

@gregtatum
Copy link
Member Author

Is PR #295 the one you were suggesting I look at, @zbraniecki ? I'd be happy to follow-up with some more work here, but my cadence will be a bit slower, as I'm doing this on the side. Or were you suggesting a separate issue #?

@zbraniecki
Copy link
Member

Is PR #295 the one you were suggesting I look at, @zbraniecki ? I'd be happy to follow-up with some more work here, but my cadence will be a bit slower, as I'm doing this on the side. Or were you suggesting a separate issue #?

No worries. Yeah, #295 is what I had in mind. Let's wait for it to land and then get back to this PR and decide on it!
In the meantime, if you'll have spare cycles and want to grab any of those https://github.com/unicode-org/icu4x/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 , let us know!

@zbraniecki
Copy link
Member

Can you rebase this PR on top of master?

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/pluralrules/Cargo.toml is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@gregtatum
Copy link
Member Author

Can you rebase this PR on top of master?

Sure thing.

@sffc sffc marked this pull request as draft October 9, 2020 17:26
@sffc
Copy link
Member

sffc commented Nov 19, 2020

Depends on #140
Depends on #107

@zbraniecki
Copy link
Member

This may also be a solution once it gets merged? rust-lang/rfcs#2990

Base automatically changed from master to main March 4, 2021 18:40
@sffc sffc added blocked A dependency must be resolved before this is actionable and removed waiting-on-dependency labels Apr 3, 2021
@sffc sffc added waiting-on-author PRs waiting for action from the author for >7 days and removed blocked A dependency must be resolved before this is actionable labels Jul 29, 2021
This will help avoid heap allocations as most lists are only a few items
long. The only boxed slice that was retained was the SampleRanges, as
this can be of an indeterminate length.
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • Cargo.lock is now changed in the branch
  • components/pluralrules/Cargo.toml is no longer changed in the branch
  • components/pluralrules/src/rules/ast.rs is no longer changed in the branch
  • components/pluralrules/src/rules/mod.rs is no longer changed in the branch
  • components/pluralrules/src/rules/parser.rs is no longer changed in the branch
  • components/plurals/Cargo.toml is now changed in the branch
  • components/plurals/src/rules/ast.rs is now changed in the branch
  • components/plurals/src/rules/mod.rs is now changed in the branch
  • components/plurals/src/rules/parser.rs is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@sffc
Copy link
Member

sffc commented Nov 22, 2021

Now that #1240 has landed, should we close this PR and the corresponding ticket as obsolete?

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(setting review bit)

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sffc
Copy link
Member

sffc commented Nov 10, 2022

This is obsolete because now we use VarZeroVec for runtime.

@sffc sffc closed this Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting-on-author PRs waiting for action from the author for >7 days
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of SmallVec in PluralRules Parser/AST
5 participants