Skip to content

made set_double arch/mem-layout independent #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

czurnieden
Copy link
Contributor

It is a bit slower (although completely unoptimized yet) but independent of anything but float.h and the macros DBL_EXP_MAX and FLT_RADIX therein. The same code can be used for floatwith minimal changes (just change DBL_EXP_MAX to FLT_EXP_MAX ). The same could be said for long double but that needs some more tests because of the slightly different format of long double and not everyone has it.

It is now independent from the type of the architecture and its memory-layout. The values of DBL_EXP_MAXandFLT_RADIX could be hardcoded to avoid the inclusion offloat.hbut the existence offloat.h` is a good sign for the existence for the availability of at least some floating point functions.

@minad
Copy link
Member

minad commented Feb 14, 2019

I don't like this implementation since it is probably much slower. This was the reason why I started accessing the floating point bits directly. However I would be interested in some benchmarks. Do you have some measurements? If too slow, I would like to keep the current version available and use it on the platforms where the current version works (e.g. I tested it on x86_64, x86, arm and arm64).

@czurnieden
Copy link
Contributor Author

@minad It is too slow for what exactly?

@minad
Copy link
Member

minad commented Feb 14, 2019

@czurnieden For my use case. But at first I would like to know if it is slower and by how much.

@minad
Copy link
Member

minad commented Feb 14, 2019

By too slow I mean the slowdown compared to the current version. It is just a matter of what you willing to accept. Factor 2 or 10? But if the slowdown is only marginal and the other function is easier, better to maintain, more portable, why not?

@czurnieden
Copy link
Contributor Author

The while-loop does the work, so it is faster for very small input (< 10.0 or so) but the loop might run up to 1024 times. It is, as I mentioned, completely unoptimized.
You can get rid of it by multiplying frac with 2^53 (9007199254740992.0 hence needs an uint64_t or something of similar min.size) put it into an mp_int and multiply with expnt (mp_mul2d). I think I did something like that in one of my other commits for mp_set_double.

Skips the loop completely but needs memory that can hold frac * 9007199254740992.0 and a working cast mechanism to get the result in an uint64_t (or similar size) variable.

It would than be as fast as your version but still independent from architecture and memory-layout (with the exception re. uint65_t casting above).

Want me to do it?

@minad
Copy link
Member

minad commented Feb 14, 2019

@czurnieden If you have time, it is fun for you to work on it, sure go on! Maybe you have other more interesting things to hack on ;)

For me the current version works well. But I am also using tommath only on the aforementioned very mainstream architectures (Therefore for me it would also be acceptable to just restrict compilation to a whitelist of supported platforms). My requirements are just that a new version of the function should produce exactly the same outputs as the current one. Furthermore I would prefer not to have a slowdown of more than about 1.5 if we get other advantages in return.

@czurnieden
Copy link
Contributor Author

Knowing what it does in theory does seldom reflect the behaviour in the real world. So I did a littke bit of benchmarking,

tests of limits (inf, NaN, a.s.f., neglible)
loop 1000 times
   x_{n+1} = 2x_n
   x_{n+1} = 2x_n - 1
   x_{n+1} = 2x_n - 1 (yes, two times, positive and negative)
end loop
srand(0xdeadbeef)
loop 1.000.000 times
   x = rand_1 * rand_2 + 1 (rand_1 != rand_2)
   x = -x
end loop

Mine (as it is, still not optimized at all) is about 5 times slower than yours, 4 times without the recursions (which involve very large numbers, hence more loops).

Even a bit lower than I would have expected but in the range.
An optimized version would probably run close to 1:1.

@czurnieden
Copy link
Contributor Author

Allowing stdint.h get's it down to about two times.

Optimizing my homegrown frexp to use a giant-steps/baby-steps like algorithm with tables needs a fixed DBL_EXP_MAX which would ruin the flexibility of the code (needs extra code for float and long double instead of a couple of preprocessor branches) and would get it down to a mere 1.7 times.

Please bear in mind that the actual runtime for all of the 2,906,007 calls is ~2.067 sec., ~1.888 sec,. and ~1.075 sec. for the uint64_t version, the fixed DBL_EXP_MAX version and your version respectively.

Nearly one and a half million calls per second with a single core of an elderly AMD A8-6600K is too slow? Even with the slowest but most portable method you still get over half a million calls per second!

@minad
Copy link
Member

minad commented Feb 15, 2019

If you pay too much for every operation you perform, it accumulates quickly. But I agree that this function would hardly become a bottleneck. However I don't see the advantages of replacing the current version with a more portable version. The current function makes assumptions about the float layout, however the function is also simpler than what you are proposing.

@minad
Copy link
Member

minad commented Feb 15, 2019

I would not add a long double version since I've not seen much long double usage anywhere. The float versions might be good for platforms which don't support double.

@minad
Copy link
Member

minad commented Feb 15, 2019

But concerning the performance issue. I prefer to keep the current version since I have yet to see where it causes real problems.
It works on every platform Debian supports.
The issues we had on Mac is due to the preprocessor and the issue on the mk68 is still there even with this version, right?

The current version is the reasonable implementation for every modern platform.

@minad
Copy link
Member

minad commented Feb 15, 2019

current version:

  • assumes checks for IEEE754
  • there are issues with detection of supported platform in the preprocessor
  • simple: accesses the bits, sets the mantissa, shifts by exp, no loops
  • fast 1.7 times faster than the proposed version
  • I confirmed that it works on the most common platforms (x86, x86_64, arm, arm64)
  • works on all platforms debian supports

version in this PR:

  • assumes checks that FLT_RADIX=2 and that DBL_MAX_EXP is defined
  • complex: performs all operations in loops, repeated bigint multiplication and addition
  • slow (factor 1.7 in your benchmark, worse in the worst case if the loops run fully?)
  • should work on all the platforms the current version works too (I have not tested this)

@czurnieden
Copy link
Contributor Author

czurnieden commented Feb 15, 2019

It works on every platform Debian supports.

LTM is not restricted to what Debian supports; it needs to support a whole plethora of platforms.

The issues we had on Mac is due to the preprocessor

Yepp. Did you patch it?

issue on the mk68 is still there even with this version, right?

The reason for this issue was an unexpected, well, forgotten behaviour of the M68K. The patch I offered was denied by you because you decided that my patch, using an algorithm that is not only commonly used in the industry (e.g.: in Boost) but also provable correct for our domain, was not good enough for you.

assumes IEEE754

LTM needs to support more than that.

there are issues with detection of supported platform in the preprocessor

Have you patched that?

simple: accesses the bits, sets the mantissa, shifts by exp, no loops

I wouldn't call a nearly bare-metall operation simple. Loops on the other side get taught to beginners.

fast

Fast in what relation? A turtle is also fast in relation to a common garden snail as a MIG is fast in relation to a common jetliner.

I confirmed that it works on the most common platforms (x86, x86_64, arm, arm64)

ARM can switch between big and little endian. Does it work with both?

LTM can run on an S/390 and its relatives e.g.: IBM System z, pSeries, iSeries. Does your version run on S/390 or any of its relatives, too?

Did you check both, AMD-64 and Intel-64?

LTM is small enough to run on most of the bigger MCUs. Most of them don't have a FPU but some have partial support (at least that's what Digikey told me within the allotted 5 minutes of research).

What about softfloat support? You were lucky that Qemu's M86K simulation took Hauser's program which allows for gutting. I do not know if it would work with an actual M68881/M68882 or any other softfloat implementation/compiler.

LTM need to work with a lot of compilers, not all are standard-compliant (As far as I know: the only one with 100% c11 cover is Intel's compiler), some don't even come close, so LTM needs to rely on as little standard compliance as reasonable possible and check for the rest.

assumes FLT_RADIX=2 and that DBL_MAX_EXP is defined

No, it doesn't assume it, it tests for it. You assume it without testing it. FLT_RADIX isn't necessarily 2 (IBM 360, for values see e.g.: www.netlib.org/fp/dtoa.c ) and the restriction to 2 is only temporary, I'm already working at it. It's FLT_RADIX=16 there, a power of 2, so it is not that much work but testing on the S/360 is quite a chore.

complex: performs all operations in loops, repeated bigint multiplication and addition

I wouldn't call loops complex. The bigint is used to avoid stdint.h (some users do not want it and I'm not here to judge). It is easily replaced if you want but it needs a bit more work for long double. And it is not that much faster to be honest.

I would not add a long double version since I've not seen much long double usage anywhere

A user asked for it in #159 and as it was no work to implement it…
It is quite slow (relative to the others) and needs some touching but it works as it is.

should work on all the platforms the current version works too

And more.

(I have not tested this)

Why have you not tested it?

The current version is the reasonable implementation for every modern platform.

I would be cautious with "every", you'll never know what is to come, but otherwise: yes, you can say that for modern platforms but LTM is used on old platforms, too. Probably even more so, because modern platforms with a lot of HP under the hood most likely use GMP if they are OK with the licence (I think even Mathematica used GMP), so it is more or less only the license that makes LTM attractive for modern platforms.

LTM is lightweight, and can be made even more so if you don't need all functions, it runs on everything and your kitchen sink, and it is simple, everyone can hack it (even I can ;-) ) but there is one thing that it is not: it is not highly optimized for speed. I have a lot in my backlog but it will need more memory so some may find their way into the main version, some into a separate branch - maybe needforspeed or would we get in trouble with EA for that? ;-)

It is also the main reason that I used a branch named withfloats and I'm a bit with Tom St Denis here, who did not want floats in LTM in the first place: 'It'd be nice to not have any float code "standard" in LTM.' (in #3 at the end)

No, I'm sorry, but LTM has to serve a lot of different users with different demands. Changing something is a complicated process and not always welcome by everyone. At least not in the beginning.

@minad
Copy link
Member

minad commented Feb 15, 2019

LTM is not restricted to what Debian supports; it needs to support a whole plethora of platforms.
assumes IEEE754
LTM can run on an S/390 and its relatives e.g.: IBM System z, pSeries, iSeries. Does your version run on S/390 or any of its relatives, too?
LTM is small enough to run on most of the bigger MCUs. Most of them don't have a FPU but some have partial support (at least that's what Digikey told me within the allotted 5 minutes of research).

Floating point functions are optional. This was what was agreed upon when the float PR was merged.
It is ok if they only work on a restricted subset of platforms. Eg FP support won't be available on embedded systems.

Yepp. Did you patch it?
there are issues with detection of supported platform in the preprocessor

I proposed to enable the float functions via whitelisting, which would solve that.

The reason for this issue was an unexpected, well, forgotten behaviour of the M68K. The patch I offered was denied by you because you decided that my patch, using an algorithm that is not only commonly used in the industry (e.g.: in Boost) but also provable correct for our domain, was not good enough for you.

Hmm. I don't see why we need an epsilon test. On compliant platforms the result is exact.

I wouldn't call a nearly bare-metall operation simple. Loops on the other side get taught to beginners.
I wouldn't call loops complex. The bigint is used to avoid stdint.h (some users do not want it and I'm not here to judge). It is easily replaced if you want but it needs a bit more work for long double. And it is not that much faster to be honest.

The current version is shorter and it performs simpler low level operations. Shifting is simpler and faster than multiplication and addition.

Fast in what relation? A turtle is also fast in relation to a common garden snail as a MIG is fast in relation to a common jetliner.

Fast in relation to the other version. As I said before, if you accept a slowdown of factor two everywhere, your system will be half as fast as before.

ARM can switch between big and little endian. Does it work with both?

Yes, it will work if you recompile. I don't know any system which performs the switching at runtime. We all have better things to do than to take care of such things.

Did you check both, AMD-64 and Intel-64?

What is Intel-64? Itanium? I suspect it to run on Itanium too, but the platform is irrelevant as of now.

A user asked for it in #159 and as it was no work to implement it… It is quite slow (relative to the others) and needs some touching but it works as it is.

Ok, I did not see that someone asked for long double support. I would probably still keep it out of the library.

@sjaeckel What do you think?

@sjaeckel
Copy link
Member

That I'm currently on mobile and only back at home on Sunday :-)

@minad
Copy link
Member

minad commented Feb 15, 2019

@czurnieden Sorry, it seems I edited your comment while answering. I tried to revert to the old version manually, since the horrible interface does not support reverting :(

@minad
Copy link
Member

minad commented Feb 18, 2019

@czurnieden Thanks for the changes! I am more happy now :)

I would still drop long double support since I've only seen it used rarely.
Furthermore the float version could also need a bit accessing version for consistency ;) Ahh. you have that.

@buggywhip
Copy link

buggywhip commented Feb 18, 2019 via email

@minad
Copy link
Member

minad commented Feb 18, 2019

@buggywhip Yes, we assume that and it is the right thing to do. You are doing something wrong when you store invalid doubles received from an untrusted source inside a C variable of type double. I think I said that before. The function receives a double as argument and not a byte array.

However what you might want is a function mp_set_double_bits which receives as arguments the unpacking as specified by the user. However I think this is out of scope. You could and should write such a function yourself based on the format you are processing. If you wonder why the mp_set/get_double functions are in scope then - they are since they are necessary to interact with the C builtin double type. And supporting conversion to and from different builtin C types is important.

@czurnieden
Copy link
Contributor Author

@buggywhip

...but there is more.

There's always more and always will be, no matter what. ;-)

There are no FP operations performed so it matters not if the FPU is IEEE754 compliant or not.

There are none if the FPU is compliant but there are if it is not. There is also the problem of rounding lingering we didn't even touch yet but which can only be handled with a reasonable amount of work if the machine that runs LTM is IEEE-compliant in that regard, or at least offers some method to read the rounding mode.

But that's not what you meant, I think?

Any conditioning should be based of the state of the data being in the correct format. Unfortunately you cannot know that until runtime and even then you cannot really know.

The conditioning that exist now is just a filter to restrict the input to something the function knows how to handle. That makes the function short and simple. For larger values of "short" and "simple" ;-)

To be able to handle input with the least amount of restriction we need some more information.

Let the input be a floating point number of the form s * m * r^x with s in {1,-1}, and m, r, x in N but in an unknown format, maybe just a byte-array because you were talking about a binary file. Then we need the information where s, m, r, x are situated, their limits, and the endianness of the byte-array (and a bias for x if there is any and also everything else I forgot).

The other function that does it in reverse needs the same information, of course.

So a function that can do it would look like: mp_set_float(const char *input, struct float_format_t format) with reasonable defaults and maybe some presets.

That would be general enough to cover most of the cases (Just because I couldn't come up with something that it doesn't cover doesn't mean there isn't anything lurking in the dark waiting for the moment to spread utter despair and hopelessness all over the quivering mass of innocent little programmers—but I digress) and the user can easily roll their own little wrappers to do their deeds.

At least that's what I think you want from reading:

Please, mp_get_double() and mp_set_double() should ALWAYS be compiled (not conditioned), and the functions' signatures should be extended to include the necessary arguments. (The existing signatures could be transformed to functions that set supported default values before calling the extended signature versions.)

Not that I wouldn't like the idea, I do, but: who wants to write it? Any volunteers?
*crickets*
Yeah, feared that.
*sigh*

Yes, I can already hear you saying that you don't need such a broad function but a large general function with a handful of small wrappers is almost always easier to maintain than a handful of medium, closely related functions; and the number "handful" will grow in size, no matter what.

I think, in contrast to my colleague, that it still would be a "nice to have", but just that: a "nice to have" with the "nice to have" priority that comes along with it. So: no reason to give up all hope but it might take a while and might not even find its way into LTM at the end.

@minad

Ahh. you have that.

Well, it was just C&P and a very small bit of adjustment.
I'm lazy, yes, but not that lazy ;-)

Hmm. I don't see why we need an epsilon test. On compliant platforms the result is exact.

LTM, including the floating point operations, runs well on the M68K where the FPUs have that little quirk but are otherwise compliant enough to be included. And it is only for the test and only for the M68K which is still in use (it's a tough little cpu that takes a knock or two and is still available ) and it checks if a==b first. All of the other architectures get a simple a==b from the preprocessor instead. It might be an idea to log the cases where a!=b, though.
A Q&A at Stackexchange gives a bit more details.

The -ffloat-store option is not a bad solution for Debian but it is not available at all compilers that are able to compile LTM correctly.

Oh, and I put the recursions x_{n+q} = 2x_n - 1.0 (with x_0 = 2.0) back in. They fail only in some cases of non-standard floating point implementations but those failures will, hopefully, give us some feedback by way of user complaints and although I'm pretty sure we covered all bases, you'll never know what's out there.

Ah…what else was there… the current S/390 OS is z/OS 1.1.0 and the compiler there has a switch to get IEEE-754 compliant floats (but only in Unix-mode?). The code will utter a #warning (should it be an #error instead?) if it sees FLT_RADIX == 16 and informs the user about that compiler option. Not yet tested, because testing it would be an *cough* herculean task.

I'm currently setting up a 16-bit test environment (only Freedos with qemu) with DJGPP, BCC, Mars, and OpenWatcom for some full 16-bit tests. This may take while ;-)
I don't know how the FPU is emulated by qemu but it should be a good test for the floats, too.

Yes, I am aware of the limitations of that project but its better than nothing and, best of all: fun to do!

That was a long one, sorry for that.

@minad
Copy link
Member

minad commented Feb 19, 2019

@czurnieden

So a function that can do it would look like: mp_set_float(const char *input, struct float_format_t format) with reasonable defaults and maybe some presets.

This is similar to the function mp_set_double_bits I suggested in my comment to @buggywhip. However I think it is out of scope. This is the job of a serialization library.

Well, it was just C&P and a very small bit of adjustment.
I'm lazy, yes, but not that lazy ;-)

You are not the one being lazy here. You are doing the stuff and I am coming and shooting it down ;)
Sorry for that! But I think we have found an acceptable solution now.

LTM, including the floating point operations, runs well on the M68K where the FPUs have that little quirk but are otherwise compliant enough to be included. And it is only for the test and only for the M68K which is still in use (it's a tough little cpu that takes a knock or two and is still available ) and it checks if a==b first. All of the other architectures get a simple a==b from the preprocessor instead. It might be an idea to log the cases where a!=b, though.

Concerning the epsilon test I am ok to use it for M68K but not x86_64 and processors where the result is precise.

Btw, you didn't write about long double. Do you want to keep it?

@czurnieden
Copy link
Contributor Author

Btw, you didn't write about long double. Do you want to keep it?

Wasn't written in less than 5 minutes, so: yes, I want to keep it ;-)

The CPUs supporting it still have quite a large distribution and it makes the thing more complete. I think I add a little measure to check if sizeof(long double) > 8and get rid of most of the Arch-checks but I have to check how the quad-precision folks have it implemented (can't check for sizeof(long double) == 16 because amd64 uses 16 bytes for 80 bit long double would have to check mantissa size) but the only one with actual hardware support is Sparc64. Arm64 is the other one with quad support but only as a software implementation. Another weird construct is IBM's extended double which is actually a double-double (more precision, not bigger numbers; the exponent is still that of a normal double).

So, no quad-precision but I will add a line or two to include more architectures in mp_set_long_double(). It makes no sense to deny entry if I can just send them to mp_set_double() instead.

Concerning the epsilon test I am ok to use it for M68K but not x86_64 and processors where the result is precise.

It probably is not only the M68K. See for example the GCC manual for -mfpmath. But that needs some more testing. I don't think I have actual i368 hardware with an FPU, qemu must do it for now.

Copy link
Member

@minad minad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with minor changes

@czurnieden
Copy link
Contributor Author

Successfully tested this branch with an architecture that has double == long double:

Guest:

$ cat dbl_print.c
#include <stdio.h>
#include <stdlib.h>
#include <float.h>

int main (void) {
   printf("%d == %d  %d == %d\n",
           LDBL_MANT_DIG, DBL_MANT_DIG,
           LDBL_MAX_EXP,    DBL_MAX_EXP);
   exit(EXIT_SUCCESS);
}
$ ./dbl_print 
53 == 53  1024 == 1024
$ uname -a
Linux debian 3.16.0-6-versatile #1 Debian 3.16.57-2 (2018-07-14) armv5tejl GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 8.11 (jessie)
Release:        8.11
Codename:       jessie
$ cat /proc/cpuinfo
processor       : 0
model name      : ARM926EJ-S rev 5 (v5l)
BogoMIPS        : 500.22
Features        : swp half thumb fastmult vfp edsp java 
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant     : 0x0
CPU part        : 0x926
CPU revision    : 5

Hardware        : ARM-Versatile PB
Revision        : 0000
Serial          : 0000000000000000

Host:

$ qemu-system-arm --version
QEMU emulator version 3.1.0
Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers

(oldstable only, couldn't get Debian 9.8.0 to boot in qemu-system-arm, was too impatient)

@czurnieden
Copy link
Contributor Author

czurnieden commented Feb 21, 2019

Works with SPARC64

cpu		: TI UltraSparc IIi (Sabre)
fpu		: UltraSparc IIi integrated FPU
pmu		: ultra12
prom		: OBP 3.10.24 1999/01/01 01:01
type		: sun4u
ncpus probed	: 1
ncpus active	: 1
D$ parity tl1	: 0
I$ parity tl1	: 0
Cpu0ClkTck	: 0000000005f5e100
cpucaps		: flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
MMU Type	: Spitfire
MMU PGSZs	: 8K,64K,512K,4MB
$ uname -a
Linux debiansparc64 4.15.0-2-sparc64 #1 Debian 4.15.11-1 (2018-03-20) sparc64 GNU/Linux
$ gcc -dM -E -x c /dev/null | grep BYTE_ORDER
#define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__
$ gcc -dM -E -x c /dev/null | grep LDBL
#define __LDBL_MAX__ 1.18973149535723176508575932662800702e+4932L
#define __LDBL_MAX_EXP__ 16384
#define __LDBL_HAS_INFINITY__ 1
#define __LDBL_MIN__ 3.36210314311209350626267781732175260e-4932L
#define __LDBL_HAS_QUIET_NAN__ 1
#define __LDBL_HAS_DENORM__ 1
#define __LDBL_DECIMAL_DIG__ 36
#define __LDBL_EPSILON__ 1.92592994438723585305597794258492732e-34L
#define __LDBL_MANT_DIG__ 113
#define __LDBL_MIN_EXP__ (-16381)
#define __LDBL_MAX_10_EXP__ 4932
#define __LDBL_DENORM_MIN__ 6.47517511943802511092443895822764655e-4966L
#define __LDBL_MIN_10_EXP__ (-4931)
#define __LDBL_DIG__ 

@minad
Copy link
Member

minad commented Feb 21, 2019

Cool! Do you have a sparc or is this a qemu instance? You mentioned qemu a few times before, so I assume that. May I ask you a favor - could you somehow share these qemu images you produced for the different architectures? I know I could produce them myself and I also did that a few times but it still requires a lot of effort imho. This would allow me to also run the test on those - in particular sparc and sparc64 are interesting because of big endian etc. I think there is a website where some guy offers many images for different architectures but they were very outdated.

@minad
Copy link
Member

minad commented Feb 21, 2019

Right now I have access to x86, x86_64, arm, arm64 hardware but not more (probably also some mips somwhere, but I don't know). And none of those is genuinely big endian :( It seems also that debian dropped architectures and there are not many big endians left?

@czurnieden
Copy link
Contributor Author

There's https://github.com/libtom/libtommath/wiki :-)

Yeah, why do I even ask *sigh* ;-)

OK, started it: home page with the first paragraph from https://www.libtom.net/LibTomMath/
And now to the tedious task of transfering manually fiddled LaTeX to Github's markup.

@czurnieden
Copy link
Contributor Author

Started https://github.com/libtom/libtommath/wiki/Successes-and-Failures-of-Building-LibTomMath
I have neither Windows (all versions, all architectures) nor a modern MacOS, so please add, thank you.

A small caveat: Github's markup does not support code-blocks inside (nested?) lists. Either find a way to use Markup to do so or use HTML as I did. You also cannot use Markup inside HTML lists as it seems, you need to do e.g.: HTML links.

@sjaeckel sjaeckel mentioned this pull request Mar 7, 2019
@czurnieden
Copy link
Contributor Author

Learned that Travis has quite a long timeout for a single job: 50 minutes so I added the ability to run Valgrind to testme.sh and one test to .travis.yml. Seems to work.
I think @sjaeckel wanted to have it in general and for all tests?


cast.ldbl = b;

exp = (int)cast.ldbl_guts.exponent;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid using bitfields here? I would use shift and masking to access bits since bitfields are not well specified. I use bitfields only for packing if the layout does not matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid using bitfields here?

Yes, we can (I think). But I found bit-fields simpler.

bitfields are not well specified

They are quite well defined in the standard (vid. e.g.: 6.7.2.1 in INCITS/ISO/IEC 9899-2011[2012] for a start), but this one uses "or some other implementation-defined type[s]" so it is not "standard" in the most narrow sense, admitted but long double is not "fully standard" either.

That's why I have it restricted to the two compilers GCC and clang. The MingW port of GCC already needed some arm-twisting to make it complicit. MSVC can also be persuaded, at least that's what they told me, but I don't have one to test it and it is also moot because MSVC does not support long double as far as I remember.

And it is for long double, a type in widespread use but restricted to the x87 architecture. The number of different hardware implementations is: two. I think this code handles both of them sufficiently.

Yes, the world would be a happier place if long double goes extinct and gets replaced with a proper binary128. Or binary96 if you insist. But it won't happen, at least not in the near future. Maybe together with 128-bit registers?

@minad minad requested review from minad and removed request for minad October 8, 2019 20:35
@minad
Copy link
Member

minad commented Feb 20, 2020

@czurnieden I am going to close your outdated PRs for now, since they would need a lot of rebasing and discussion somehow died off without a merge a while ago. I am also at least partially responsible for that due to my refactorings etc. For now it would probably be best if we concentrate on fixing a few correctness issues, clean up the remaining API issues (#465 etc). But since you still seem to have a lot in your pipeline, it would be nice to have some kind of overview of what you have and what would be nice to have upstreamed here. But if possible, the PRs should be small and easy to review such that the time frame for a merge is shorter and the PRs don't start to bit rot again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants