-
Notifications
You must be signed in to change notification settings - Fork 126
Integrate cpp_double_fp_backend #648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #92 with good final report
minor change in header order
fix silly mistakes
Gsoc2021 double float chris
…into gsoc2021_double_float_chris # Conflicts: # .github/workflows/multiprecision_quad_double_only.yml # .gitignore # performance/performance_test.cpp # test/test_arithmetic.hpp
Gsoc2021 double float chris
Gsoc2021 double float chris
Gsoc2021 double float chris
…into cpp_double_fp_backend
I posted some latest YADE benchmark results in BoostGSoC21#190 , suddenly it starts to look good with clang. |
Note to self: TODO Hit the edge-cases of the new |
Performance of algebraic functions re-affirmed in BoostGSoC21#190 |
OK, so the bad performance mystery was solved and I did benchmarks of YADE software Here are the results:
|
type | calculation speed | factor |
---|---|---|
cpp_double_double g++ 12.2 |
449.15 iter/sec | 1 |
float128 g++ 12.2 |
263.15 iter/sec | 1.70 |
cpp_bin_float<32> g++ 12.2 |
211.81 iter/sec | 2.12 |
cpp_dec_float<31> g++ 12.2 |
78.15 iter/sec | 5.74 |
mpfr_float_backend<31> g++ 12.2 |
51.01 iter/sec | 8.80 |
Here we can see that cpp_double_double
beats everyone else by over a factor of two.
cpp_double_long_double
type | calculation speed | factor |
---|---|---|
cpp_bin_float<39> g++ 12.2 |
122.55 iter/sec | 1 |
cpp_double_long_double clang++ 19.1.4 |
108.79 iter/sec | 1.12 |
cpp_bin_float<39> clang++ 19.1.4 |
102.19 iter/sec | 1.20 |
cpp_dec_float<39> g++ 12.2 |
71.42 iter/sec | 1.71 |
mpfr_float_backend<39> g++ 12.2 |
45.75 iter/sec | 2.67 |
cpp_double_long_double g++ 12.2 |
14.97 iter/sec | 8.18 |
Here we can see that cpp_double_long_double
performs very good. But the compiler developers will have a mystery to solve: cpp_bin_float<39>
g++ 12.2 is faster than cpp_double_long_double
clang++ 19.1.4 by just a little, which in turn is faster than cpp_double_long_double
g++ 12.2 by a factor of 8.
cpp_double_float128
type | calculation speed | factor |
---|---|---|
cpp_bin_float<67> g++ 12.2 |
118.43 iter/sec | 1 |
mpfr_float_backend<67> g++ 12.2 |
43.34 iter/sec | 2.73 |
cpp_dec_float<67> g++ 12.2 |
40.09 iter/sec | 2.95 |
cpp_double_float128 g++ 12.2 |
14.99 iter/sec | 7.90 |
Here we can see that cpp_double_float128
has a lot of potential to beat cpp_bin_float<67>
once the g++ developers sort out the problems with cpp_double_long_double
g++ 12.2. The increase in performance should be about by a factor of 8 :)
So all is good. I think we can merge this branch once documentation and other small TODOs are complete.
Thank you Janek (@cosurgi) that was a big effort, and it really provided a lot of information and clarity. Some of the results on The newer i7 processors have extremely powerful 64-bit floating-point hardware operations, and it seems like these are being very well supported nowdays in hardware and software. Down the road I will be doing some non-x86_64 measurements on M1 and/or M2 and a few embedded bare-metal controllers like an ARM(R) Cortex(R) M7, having double-precision floating-point FPU support. All-in-all I'm somewhat surprised at how fast I'm happy enough with it to make a first release out of this state. Cc: @sinandredemption and @jzmaddock |
There might be one more thing to check: that each of the backend/compiler configurations are doing (roughly) the same amount of work. Something that can happen when there is a tolerance set for termination is you can hit "unfortunate" parameters which cause the code to thrash through many needless iterations which don't actually get you any closer to the end result. I have no idea if this is the case here, but because they don't behave quite like exactly rounded IEEE types, things like |
Indeed. There are several potential dangers. Let's say we use At the same time, we know that Even worse, this backend is new, so there might be undiscovered problems in the areas of subnormal/zero. So you might iterate until the maximum iteration setting. We actually had several cases like this when John helped me see through the last tricky spots in the specfun tests. Who knows if we really got all the edge cases? Cc: @jzmaddock and @cosurgi |
…into cpp_double_fp_backend # Conflicts: # .github/workflows/multiprecision.yml # README.md
So I have updated this PR to the post-1.88 develop branch of Multiprecision. And it's going green once again (with the updated CI). There were a few trip-ups along the way, but nothing out of the ordinary. It's time to continue working the known final points in BoostGSoC21/multiprecision/issues/160. I'm not sure if all this will be ready for 1.89, but there is a chance. Cc: @cosurgi and @jzmaddock and @sinandredemption |
…into cpp_double_fp_backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some mild observations while we're waiting for this to land.
Hi @LegalizeAdulthood thank you for your review points. I will get to these. Your points seems sensible. It's been a while in the making, but I use this backend already locally and it seriously accelerates perturbative Mandelbrot calculations like by a factor of 3. No other backend that I am aware of kicks it like this one --- at double-double for about 32 digits. I do not know if/when I'll get this Boost-ready, but I'm still on it. Cc: @jzmaddock |
Hi @LegalizeAdulthood I'll leave all the conversations open for now. I need to get back to these a bit later. Thanks again for contributing. |
Nice! My friend integrated the QD library into his ManPWin and I think he reported a significant speedup as well. I think currently, at least for open source fractal renderers, kalles fraktaler 3 is one of the fastest out there, if not the fastest. He has SIMD and GPU (OpenCL) paths, I haven't studied the code extensively enough to know the full details though. If I can be of assistance in helping this pull request be accepted, let me know. It will ultimately help me too |
Hi Richard (@LegalizeAdulthood) if you get a chance, you could consider using the Other than that, I think there are still some edge cases in rounding and round-tripping. Those are the only open points left. I deactivated many of the tests in rounding and round-tripping and I think these should pass. I'll need to talk with John about these sometime down the road. |
Handled in 748b751 |
Redundant with #515 |
No description provided.