-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large(r) Internally represented FFE's #2846
Conversation
…t fields of order up to 2^24 can be internally represented. An internal FFE is now UInt4 (32 bit). The changes in the code are actually minimal. Only the computation of a primitive root modulo p has been changed a bit. So far, primitive roots modulo p with p < 2^16 are still computed as before. Primitive roots modulo p for 2^16 < p < 2^24 are now computed using the Library function PrimitiveRootMod. This requires btw the use of InitFopyGVar in finfield.c. Thanks to Markus Pfeiffer and Max Horn for pointing this out. The restriction 2^24 seems arbitray. Theoretically, the adaptions made could allow fields of order up to 2^32. But then the Zech Log table will blow up, and the amount of data in ffdata.c as well. So far, we kept the limit to 2^24.
The size of We can make pre-computation of finite fields data part of the compilation process (which makes compilation take longer...). Additionally would it be sensible to make the size of finite fields whose elemetns are represented as immediates a compile-time-switch. Or we allow pre-computation of some of these values on the fly at runtime... |
ffdata.c contains a precomputed list of characteristics, degrees, and sizes of all internal FFs. Up to order 2^16, these lists contain only 6542 entries. Up to 2^24, this number becomes 1077871. The FF code of GAP 4r8 did not rely on these precomputed lists. Upon construction of a FF in GAP, it was first checked whether the required FF was not yet constructed (by looking at each entry of the list of already constructed FFs, and checking whether characteristic and degree match with characteristic and degree in a dynamical lists CharFF and DegrFF. By using the precomputed lists, an interpolated search is possible. However, this approach makes the precomputed lists necessary, and the amount of data explodes of course when increasing the MAXSIZE_GF_INTERNAL to 2^24. So maybe we should consider going back to the implementation of internal FFs of GAP4r8. |
The pre-computed data was added for HPC-GAP to avoid awkward problems. I'll have to look again whether we can solve this problem slightly more elegantly. |
As for the running GAP job this seems to be <10MB (which does not worry me). The source file seems to be large mainly because of the inefficiency of storing decimals. Can one improve this (e..g gzip reduces to 6MB)? |
As an experiment I tried a very simplistic, single-threaded C programme to populate these arrays. It takes about 200ms on my laptop. I'm pretty sure that could be enhanced by a significant factor with a little care. So I can't see any reason not to do this during startup. |
In HPC-GAP we could even be daring and do the coputation in a separate thread at startup ;) |
So I think we're agreed on initialising these data structures programmatically at startup, rather than having such a huge additional C file. @jdebeule would you mind doing that, or should someone else produce a new PR? |
Hi all, thanks for the interesting comments. Indeed I think initialising these data structures at start up is a good solution. Steve, would you mind sharing the code/files of your experiment? Best, Jan |
Here's the test file I ran. It uses a sieve of Erastothenes to find the primes, and adds the prime powers as it finds each prime. Depending on the exact specification of the arrays, you might need to sort it afterwards, or mark the prime powers for adding when you get to them, or something. The most obvious speedup is to deal with a few small primes systematically. So rather than using the sieve to mark and then skip over all the multiples of 2, 3 and 5, you could just look a numbers that are congruent to 1, 7, 11, 13,17, 19, 23 and 29 mod 30, for instance.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some remarks.
BTW, 200 ms extra startup time would be very bad from my perspective. But perhaps it can be optimized; or delayed until it is actually needed (but then we'd have to be very careful with e.g. HPC-GAP in mind)
@@ -10,14 +10,16 @@ | |||
## | |||
## This file deals with internal finite field elements. | |||
## | |||
## Update September 2018: MAXSIZE_GF_INTERNAL changed to 2^24. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't really belong here, that kind of information is available in the repository.
|
||
|
||
############################################################################# | ||
## | ||
|
||
#V MAXSIZE_GF_INTERNAL . . . . . . . . . . . . maximal size of internal ffes | ||
## | ||
BIND_GLOBAL( "MAXSIZE_GF_INTERNAL", 2^16 ); | ||
BIND_GLOBAL( "MAXSIZE_GF_INTERNAL", 2^24 ); #jdebeule 16/09/18 2^24 was 2^16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the #jdebeule
comment
This applies many times again later on, so I'll not repeat this each time.
@@ -48,6 +48,26 @@ | |||
** is the logarithm of $z^{a-1} + 1$. This list is usually called the | |||
** Zech-Logarithm table. The zeroth entry in the finite field bag is the | |||
** order of the finite field minus one. | |||
** | |||
** Note of September 2018. A finite field element (FFE) uses 32 bit (cfr supra). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "cfr supra" mean?
** Therefore, currently the largest allowed size for internatlly represented fields | ||
** is set to 2^24. A second hurdle to deal with larger fields is the data stored in ffdata.c. | ||
** The interpolation search requires these lists to be available, but increasing the largest | ||
** possible order, increases the amount of data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick (feel free to ignore): Ideally, kernel code comments should wrap at about 78 columns, to match existing code.
@@ -83,103 +103,14 @@ static Obj TYPE_FFE0; | |||
** 'PolsFF' is a list of Conway polynomials for finite fields. The even | |||
** entries are the proper prime powers, odd entries are the corresponding | |||
** conway polynomials. | |||
* | |||
** Update September 2018: the 6948 Conway polynomials for fields of prime power | |||
** order at most 2^32 are stored in the file finfield_conway.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this comment
@@ -1623,13 +1565,14 @@ Obj FuncZ2 ( Obj self, Obj p, Obj d) | |||
{ | |||
ip = INT_INTOBJ(p); | |||
id = INT_INTOBJ(d); | |||
if (ip > 1 && id > 0 && id <= 16 && ip < 65536) | |||
/* jdebeule 19/9/2018 '24' was '16', 65536 became new constant from ffdata.h */ | |||
if (ip > 1 && id > 0 && id <= 24 && ip <= SIZE_LARGEST_INTERNAL_FF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 24 should also be made a constant. Say BITS_INTERNAL_FF
; then define SIZE_LARGEST_INTERNAL_FF
as 1 << BITS_INTERNAL_FF
@@ -157,8 +187,11 @@ extern Obj TypeFF0; | |||
** which will only work if the product of integers of type 'FFV' does not | |||
** cause an overflow. And of course the successor table stored for a finite | |||
** field will become quite large for fields with more than 65536 elements. | |||
** | |||
** Update of September 2018: we fully agree with the comments above. See | |||
** the necessary changes in POW1_FFV, VAL_FFE, NEW_FFE, and FLD_FFE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of agreeing, the remark before should be removed, and of course parts of the comment above also are not valid anymore, as they talk about UInt2 and 65536 elements etc. In general, it'd be good to go through all comments here and make sure they are up-to-date (but of course it's perfectly sensible that you did not do this so far, as it was (is?) not 100% clear this PR gets merged. In other words: I am happy if adjusting the comments is delayed until the functional part of this PR is complete and generally accepted; I merely point this (and other) nitpicks out right now as part of general reviewing.
*/ | ||
#define FLD_FFE(ffe) ((FF)((((UInt)(ffe)) & 0xFFFF) >> 3)) | ||
/*#define FLD_FFE(ffe) ((FF)((((UInt)(ffe)) & 0xFFFF) >> 3))*/ | ||
#define FLD_FFE(ffe) ((FF)((((UInt)(ffe)) & 0xFFFFFFFF) >> 3)) /*jdebeule (16/9/18): see line above for old version */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are at it, I'd suggest to change this to a static inline
function. Moreover, I'd replace 0xFFFF
resp. 0xFFFFFFFF
by a constant, too, e.g. MASK_INTERNAL_FF = SIZE_LARGEST_INTERNAL_FF-1
Then:
static inline FF FLD_FFE(Obj ffe)
{
return (FF)(((UInt)ffe) & SIZE_LARGEST_INTERNAL_FF) >> 3);
}
Of course we could also extract the changes turning this (and other) macros into static inline
functions, as well as the new constants MASK_INTERNAL_FF
, BITS_INTERNAL_FF
, into a separate PR, which could be merged quickly, and then this PR just would have to be rebased. I'd be happy to volunteer to create such a PR, too (in order to avoid causing lots of extra work for you). But only if you'd consider that helpful, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: I really think this (or some other PR) should introduce MASK_INTERNAL_FF
; the change to static inline
, however, could also be made after this PR here is merged.
/*#define NEW_FFE(fld,val) ((Obj)(((UInt)(val) << 16) + \ | ||
((UInt)(fld) << 3) + (UInt)0x02))*/ | ||
#define NEW_FFE(fld,val) ((Obj)(((UInt)(val) << 32) + \ | ||
((UInt)(fld) << 3) + (UInt)0x02)) /* jdebeule (16/9/18): see above for old version */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should turn all the macros in this header file to static inline
(and I'd do it right now, but again, I don't want to cause too much work for you by forcing you to rebase this PR, so we can delay this to later).
|
||
#include "finfield_conway.h" /* jdebeule 16/09/18 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a script that generated finfield_conway.h
? If so, can we please add it to the repository, too? I don't really like having opaque data tables that then can't easily be edited, updated or understood.
In that regard, it is sad how e.g. 63001, 6 +242*251
turned into 63001, 60748
. If I had the script, I am sure I could adjust it to produce the latter again...
Of course if you drop this part of the change again (keeping the old table, and generating the rest dynamically), this concern goes away.
I'd hate to see this change lost. I think it's a really good idea. It occurs to me that the main issue we are having comes from the fact that we want to number the finite fields consecutively in a thread-safe way, which, itself, comes from the fact that Martin wanted to do fields up to 2^16 in 32 bit ints and that didn't leave enough bits to just put the field order in the representation of the elements. (16 (field size)+16 (Zech log) +2 tag bits > 32). We on the other hand don't have this problem in a 64 bit world. (24+24+3 < 64). So we can adopt a much simpler solution. If we put q itself in the immediate FF representation and then have a thread-safe write-only hash table giving the characteristics and degrees (or something of the kind) then we should avoid all our race condition problems without having to precompute big data structures. |
On reflection, the best solution seems to be to leave the PR more or less exactly as it is, except to |
Turns out we already have |
Hi all, thanks a lot so far for the feedback, suggestions and improvements. Past weeks were too busy for me to work on this. I will make time for this again after the Aachen workshop next week. Jan |
@jdebeule note that @stevelinton by now made PR #2997 and you might want to look there, to avoid duplicate work |
Closing this in favor of PR #2997 |
The changes in the pull request allow internally represented finite field of order up to 2^24. Changes in the .h and .c (and ffe.g) files are commented.
Some tests are broken (obviously): those where an just above the old limit (2^16) are chosen and the field of that order is constructed: the results are not expected to be in the internal representation.