Skip to content

Integrate cpp_double_fp_backend #648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 660 commits into from

Conversation

ckormanyos
Copy link
Member

No description provided.

sinandredemption and others added 30 commits August 23, 2021 00:02
…into gsoc2021_double_float_chris

# Conflicts:
#	.github/workflows/multiprecision_quad_double_only.yml
#	.gitignore
#	performance/performance_test.cpp
#	test/test_arithmetic.hpp
@cosurgi
Copy link
Contributor

cosurgi commented Jan 17, 2025

I posted some latest YADE benchmark results in BoostGSoC21#190 , suddenly it starts to look good with clang.
(initially I posted this here, but then I moved this post over there)

@ckormanyos
Copy link
Member Author

Note to self: TODO Hit the edge-cases of the new eval_pow method.

@ckormanyos
Copy link
Member Author

Performance of algebraic functions re-affirmed in BoostGSoC21#190

@cosurgi
Copy link
Contributor

cosurgi commented Jan 18, 2025

OK, so the bad performance mystery was solved and I did benchmarks of YADE software yade -n --quickperformance -j 4 on a quite recent CPU Intel i7-14700KF and the results are good. Some are interesting. We can definitely mark the performance problem of the cpp_double_fp_backend as solved. Now only the compiler developers will have something to talk about :)

Here are the results:

cpp_double_double

type calculation speed factor
cpp_double_double g++ 12.2 449.15 iter/sec 1
float128 g++ 12.2 263.15 iter/sec 1.70
cpp_bin_float<32> g++ 12.2 211.81 iter/sec 2.12
cpp_dec_float<31> g++ 12.2 78.15 iter/sec 5.74
mpfr_float_backend<31> g++ 12.2 51.01 iter/sec 8.80

Here we can see that cpp_double_double beats everyone else by over a factor of two.

cpp_double_long_double

type calculation speed factor
cpp_bin_float<39> g++ 12.2 122.55 iter/sec 1
cpp_double_long_double clang++ 19.1.4 108.79 iter/sec 1.12
cpp_bin_float<39> clang++ 19.1.4 102.19 iter/sec 1.20
cpp_dec_float<39> g++ 12.2 71.42 iter/sec 1.71
mpfr_float_backend<39> g++ 12.2 45.75 iter/sec 2.67
cpp_double_long_double g++ 12.2 14.97 iter/sec 8.18

Here we can see that cpp_double_long_double performs very good. But the compiler developers will have a mystery to solve: cpp_bin_float<39> g++ 12.2 is faster than cpp_double_long_double clang++ 19.1.4 by just a little, which in turn is faster than cpp_double_long_double g++ 12.2 by a factor of 8.

cpp_double_float128

type calculation speed factor
cpp_bin_float<67> g++ 12.2 118.43 iter/sec 1
mpfr_float_backend<67> g++ 12.2 43.34 iter/sec 2.73
cpp_dec_float<67> g++ 12.2 40.09 iter/sec 2.95
cpp_double_float128 g++ 12.2 14.99 iter/sec 7.90

Here we can see that cpp_double_float128 has a lot of potential to beat cpp_bin_float<67> once the g++ developers sort out the problems with cpp_double_long_double g++ 12.2. The increase in performance should be about by a factor of 8 :)

So all is good. I think we can merge this branch once documentation and other small TODOs are complete.

@ckormanyos
Copy link
Member Author

ckormanyos commented Jan 19, 2025

We can definitely mark the performance problem of the cpp_double_fp_backend as solved.

Thank you Janek (@cosurgi) that was a big effort, and it really provided a lot of information and clarity.

Some of the results on cpp_double_long_double, where long double is 80-bit, 10-byte in width are interesting. That hardware version of the 10-byte floating-point representation is running on the legendary (modernized) versions of the i387 FPU, the hardware that really put 10-byte floating-point on the map.

The newer i7 processors have extremely powerful 64-bit floating-point hardware operations, and it seems like these are being very well supported nowdays in hardware and software.

Down the road I will be doing some non-x86_64 measurements on M1 and/or M2 and a few embedded bare-metal controllers like an ARM(R) Cortex(R) M7, having double-precision floating-point FPU support.

All-in-all I'm somewhat surprised at how fast cpp_double_double ended up in certain harware/software configurations. As mentioned in previous posts, this backend (and of course that type specifically) have lots of room for optimization improvement.

I'm happy enough with it to make a first release out of this state.

Cc: @sinandredemption and @jzmaddock

@jzmaddock
Copy link
Collaborator

There might be one more thing to check: that each of the backend/compiler configurations are doing (roughly) the same amount of work. Something that can happen when there is a tolerance set for termination is you can hit "unfortunate" parameters which cause the code to thrash through many needless iterations which don't actually get you any closer to the end result. I have no idea if this is the case here, but because they don't behave quite like exactly rounded IEEE types, things like double double can easily break assumptions present in the code.

@ckormanyos
Copy link
Member Author

ckormanyos commented Jan 19, 2025

There might be one more thing to check: that each of the backend/compiler configurations are doing (roughly) the same amount of work. Something that can happen when there is a tolerance set for termination is you can hit "unfortunate" parameters which cause the code to thrash through many needless iterations which don't actually get you any closer to the end result.

Indeed. There are several potential dangers.

Let's say we use cpp_double_double and a particular tolreance $dx$ is set to

$$ |dx| < 1^{-300} $$

At the same time, we know that min_exponent10 for the type is something like $-291$. So the tolerance is never reached or reached after useless iterations.

Even worse, this backend is new, so there might be undiscovered problems in the areas of subnormal/zero. So you might iterate until the maximum iteration setting.

We actually had several cases like this when John helped me see through the last tricky spots in the specfun tests. Who knows if we really got all the edge cases?

Cc: @jzmaddock and @cosurgi

@ckormanyos
Copy link
Member Author

ckormanyos commented Apr 19, 2025

So I have updated this PR to the post-1.88 develop branch of Multiprecision. And it's going green once again (with the updated CI). There were a few trip-ups along the way, but nothing out of the ordinary.

It's time to continue working the known final points in BoostGSoC21/multiprecision/issues/160.

I'm not sure if all this will be ready for 1.89, but there is a chance.

Cc: @cosurgi and @jzmaddock and @sinandredemption

Copy link
Contributor

@LegalizeAdulthood LegalizeAdulthood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some mild observations while we're waiting for this to land.

@ckormanyos
Copy link
Member Author

ckormanyos commented Jun 18, 2025

Hi @LegalizeAdulthood thank you for your review points. I will get to these. Your points seems sensible.

It's been a while in the making, but I use this backend already locally and it seriously accelerates perturbative Mandelbrot calculations like by a factor of 3. No other backend that I am aware of kicks it like this one --- at double-double for about 32 digits.

I do not know if/when I'll get this Boost-ready, but I'm still on it.

Cc: @jzmaddock

@ckormanyos
Copy link
Member Author

Hi @LegalizeAdulthood I'll leave all the conversations open for now. I need to get back to these a bit later. Thanks again for contributing.

@LegalizeAdulthood
Copy link
Contributor

LegalizeAdulthood commented Jun 18, 2025

I use this backend already locally and it seriously accelerates perturbative Mandelbrot calculations like by a factor of 3. No other backend that I am aware of kicks it like this one

Nice! My friend integrated the QD library into his ManPWin and I think he reported a significant speedup as well.

I think currently, at least for open source fractal renderers, kalles fraktaler 3 is one of the fastest out there, if not the fastest. He has SIMD and GPU (OpenCL) paths, I haven't studied the code extensively enough to know the full details though.

If I can be of assistance in helping this pull request be accepted, let me know. It will ultimately help me too :)

@ckormanyos
Copy link
Member Author

If I can be of assistance in helping this pull request be accepted, let me know.

Hi Richard (@LegalizeAdulthood) if you get a chance, you could consider using the cpp_double_fp_backend backend. Boost.Multiprecision is header-only. So if you checkout the cpp_double_fp_backend_integration branch, you can immediately use classes such as boost::multiprecision::cpp_double_double.

Other than that, I think there are still some edge cases in rounding and round-tripping. Those are the only open points left. I deactivated many of the tests in rounding and round-tripping and I think these should pass. I'll need to talk with John about these sometime down the road.

@ckormanyos
Copy link
Member Author

Just some mild observations while we're waiting for this to land.

Handled in 748b751

@ckormanyos ckormanyos closed this Jun 19, 2025
@ckormanyos
Copy link
Member Author

Redundant with #515

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants