Skip to content

Conversation

@shssoichiro
Copy link
Collaborator

This improves performance at a small cost to efficiency

This improves performance at a small cost to efficiency
@shssoichiro
Copy link
Collaborator Author

@shssoichiro
Copy link
Collaborator Author

@maj160 what improvements did you see in your benchmarking? Mine is only showing 1-2% speedup for this change.

@codecov-commenter
Copy link

Codecov Report

Base: 86.80% // Head: 86.83% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (ad293b1) compared to base (8978707).
Patch coverage: 66.66% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3106      +/-   ##
==========================================
+ Coverage   86.80%   86.83%   +0.03%     
==========================================
  Files          83       83              
  Lines       33330    33331       +1     
==========================================
+ Hits        28931    28943      +12     
+ Misses       4399     4388      -11     
Impacted Files Coverage Δ
src/ec.rs 83.12% <66.66%> (-0.13%) ⬇️
src/context/partition_unit.rs 90.27% <0.00%> (-0.28%) ⬇️
src/partition.rs 75.84% <0.00%> (-0.17%) ⬇️
src/encoder.rs 86.91% <0.00%> (-0.04%) ⬇️
src/asm/x86/predict.rs 43.29% <0.00%> (ø)
src/cpu_features/x86.rs 39.58% <0.00%> (ø)
src/context/block_unit.rs 93.60% <0.00%> (+0.06%) ⬆️
src/rdo.rs 85.62% <0.00%> (+0.46%) ⬆️
src/asm/x86/lrf.rs 92.22% <0.00%> (+0.63%) ⬆️
src/context/mod.rs 92.30% <0.00%> (+1.09%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/ec.rs Outdated
* print([x for x in (stats())])
*/
// Units of 1/128 of a bit
const ENTROPY_LOOKUP: [u16; 513] = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit like a subtle performance bug that has popped up before.

Suggested change
const ENTROPY_LOOKUP: [u16; 513] = [
const ENTROPY_LOOKUP: &[u16; 513] = &[

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I had no idea that this could lead to performance issues. I'm really curious if there's any Github issue on Rustc talking about this.

Co-authored-by: David Michael Barr <b@rr-dav.id.au>
@maj160
Copy link
Contributor

maj160 commented Jan 19, 2023

Original commit author here:
I'd probably treat this as WIP or hide it behind a feature flag - It currently seems to increase filesize more than I'm comfortable with.

The lookup table values can probably be improved - I think the correct value is log2 of the probability at higher levels, but at lower levels the EC_MIN_PROB factor comes into play and complicates things. The values I've put in this change are the best compromise I came up with while experimenting, but I wouldn't say they're quite "ready" yet.

In the longer term it feels like this kind of optimisation is probably necessary to streamline RDO (e.g. using a fast approximate bit counter in some kind of pruning stage)

Thoughts?

Copy link
Collaborator

@vibhoothi vibhoothi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,

I think we could move this under preset >8 or something like that over having this approximation for all the preset. I believe for Speed0 or placebo settings, we should avoid all possible approximations. But I am open to suggestions.

And the AWCY link shared above suggests there is a hit in performance across all the resolution in Obj1-Fast and there is no speedups, is that latest and updated one?

@shssoichiro
Copy link
Collaborator Author

And the AWCY link shared above suggests there is a hit in performance across all the resolution in Obj1-Fast and there is no speedups, is that latest and updated one?

AWCY is generally not trustworthy for its performance calculations, there's too much noise on the servers. My local benchmarks with hyperfine showed a 1-2% overall improvement at speed 2. I can do another run with it at a higher speed like speed 8, it's possible the improvement is larger there.

@vibhoothi
Copy link
Collaborator

vibhoothi commented Jan 22, 2023 via email

@shssoichiro
Copy link
Collaborator Author

shssoichiro commented Jan 22, 2023

Just to be clear, this will give us approx 0.5-1.8% bdrate loss with approx 1-2% speedup?

Yes. As you mentioned, because of that I'd be hesitant about blanket applying it to all speed levels.

@shssoichiro shssoichiro marked this pull request as draft January 22, 2023 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants