Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance 10x worse than C implementation #24

Open
danielrh opened this issue Mar 11, 2016 · 4 comments
Open

Performance 10x worse than C implementation #24

danielrh opened this issue Mar 11, 2016 · 4 comments

Comments

@danielrh
Copy link
Contributor

I've been timing the C implementation versus the rust implementation and I generally notice about a factor of 8-10x difference.

Do you know offhand any obvious performance tradeoffs that were made in the design of this version?

Do you have any ideas about various strategies we could employ to bring it within a factor of two, or ideally to the same speed as the C version especially for multi-megabyte files?

I noticed no inline assembly in the C version, so I'm hoping it is possible to bring the rust version to parity.

Have you done any profiling of the existing code or compared it with the C code?

@danielrh
Copy link
Contributor Author

perf record shows

  48.54%  main  main              [.] _ZN21Decompressor$LT$R$GT$10decompress19h833558899976614058E 
  32.45%  main  main              [.] _ZN7huffman4tree4Tree13lookup_symbol20h3239110865486455736E  
   8.84%  main  main              [.] _ZN9bitreader18BitReader$LT$R$GT$10read_exact20h9307349387484
   2.72%  main  libc-2.15.so      [.] __memcpy_ssse3_back                                          
   1.69%  main  main              [.] _ZN4main20h236efd01b33cc363UdaE                              
   1.13%  main  main              [.] _ZN13brotli..State9drop.582217h2d226c43866a0306E             
   1.05%  main  [unknown]         [k] 0xffffffff81391b17                                           
   0.53%  main  main              [.] _ZN3vec12Vec$LT$T$GT$19extend_with_element20h4407735263573563

As compared with the C version

  90.61%  secbrot  secbrot           [.] ProcessCommands
   4.63%  secbrot  [unknown]         [k] 0xffffffff81391b17
   1.24%  secbrot  secbrot           [.] __intel_avx_rep_memcpy
   1.24%  secbrot  [unknown]         [k] 0xffffffff81198b61
   1.23%  secbrot  [unknown]         [k] 0xffffffff81391ef5
   1.02%  secbrot  secbrot           [.] _intel_fast_memcmp

@ende76
Copy link
Owner

ende76 commented Mar 12, 2016

Yes, performance has not been a focus point at all so far.
I've started this project as a a way to get familiar with Rust, and also with the Brotli spec.
For the implementation, I have made it a point to work only from the spec, while avoiding looking at the reference implementation, because another goal was helping to improve the Brotli spec itself.
I do believe that there are a number of places where copies could be avoided, and other places where I used clone() simply because at the time I was unable otherwise to appease the borrow checker.

For a performance-oriented implementation, I would probably take the test cases and experiences so far, and start over with a new implementation.

@danielrh
Copy link
Contributor Author

So I decided to work on a performance-focused version of the decompressor here:
https://github.com/dropbox/rust-brotli

It passes the first few unit tests..but still has a few correctness issues I'm working through.
I decided to do a very apples-to-apples port, avoiding even the rust stdlib, so that we can get a really solid performance comparison between C and Rust down to the same custom memory allocator.

I'd love comments if you have any--though of course it's still early since only the very basic tests pass

@ende76
Copy link
Owner

ende76 commented Apr 17, 2016

Wow, that is a very impressive effort! Was rust-alloc-no-stdlib a direct consequence of the as-close-as-possible port?
No helpful comments yet, other than that it looks like very organized code. It will certainly be a superior implementation once you've ironed out the last few correctness issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants