Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large(r) Internally represented FFE's #2846

Closed
wants to merge 3 commits into from

Conversation

jdebeule
Copy link

The changes in the pull request allow internally represented finite field of order up to 2^24. Changes in the .h and .c (and ffe.g) files are commented.

Some tests are broken (obviously): those where an just above the old limit (2^16) are chosen and the field of that order is constructed: the results are not expected to be in the internal representation.

jdebeule added 3 commits September 19, 2018 17:51
…t fields of order up to 2^24 can be internally represented. An internal FFE is now UInt4 (32 bit). The changes in the code are actually minimal. Only the computation of a primitive root modulo p has been changed a bit. So far, primitive roots modulo p with p < 2^16 are still computed as before. Primitive roots modulo p for 2^16 < p < 2^24 are now computed using the Library function PrimitiveRootMod. This requires btw the use of InitFopyGVar in finfield.c. Thanks to Markus Pfeiffer and Max Horn for pointing this out. The restriction 2^24 seems arbitray. Theoretically, the adaptions made could allow fields of order up to 2^32. But then the Zech Log table will blow up, and the amount of data in ffdata.c as well. So far, we kept the limit to 2^24.
@markuspf
Copy link
Member

The size of ffdata.c (26MB) is defninitely a problem now, in my opinion.

We can make pre-computation of finite fields data part of the compilation process (which makes compilation take longer...). Additionally would it be sensible to make the size of finite fields whose elemetns are represented as immediates a compile-time-switch.
We'll have to make sure tests are adapted as well.

Or we allow pre-computation of some of these values on the fly at runtime...

@jdebeule
Copy link
Author

ffdata.c contains a precomputed list of characteristics, degrees, and sizes of all internal FFs. Up to order 2^16, these lists contain only 6542 entries. Up to 2^24, this number becomes 1077871. The FF code of GAP 4r8 did not rely on these precomputed lists. Upon construction of a FF in GAP, it was first checked whether the required FF was not yet constructed (by looking at each entry of the list of already constructed FFs, and checking whether characteristic and degree match with characteristic and degree in a dynamical lists CharFF and DegrFF. By using the precomputed lists, an interpolated search is possible. However, this approach makes the precomputed lists necessary, and the amount of data explodes of course when increasing the MAXSIZE_GF_INTERNAL to 2^24. So maybe we should consider going back to the implementation of internal FFs of GAP4r8.

@markuspf
Copy link
Member

markuspf commented Sep 20, 2018

The pre-computed data was added for HPC-GAP to avoid awkward problems. I'll have to look again whether we can solve this problem slightly more elegantly.

@fingolfin fingolfin added the gapdays2018-fall Issues and PRs that arose at https://www.gapdays.de/gapdays2018-fall label Sep 20, 2018
@hulpke
Copy link
Contributor

hulpke commented Sep 23, 2018

As for the running GAP job this seems to be <10MB (which does not worry me). The source file seems to be large mainly because of the inefficiency of storing decimals. Can one improve this (e..g gzip reduces to 6MB)?
Also, it would be helpful to understand why this table (which presumably is created by program) could not be created on the fly when starting GAP. @markuspf pointed to HPCGAP, but there should be no parallelity issue if the data is created before calculations start.

@stevelinton
Copy link
Contributor

As an experiment I tried a very simplistic, single-threaded C programme to populate these arrays. It takes about 200ms on my laptop. I'm pretty sure that could be enhanced by a significant factor with a little care. So I can't see any reason not to do this during startup.

@markuspf
Copy link
Member

In HPC-GAP we could even be daring and do the coputation in a separate thread at startup ;)

@stevelinton
Copy link
Contributor

So I think we're agreed on initialising these data structures programmatically at startup, rather than having such a huge additional C file. @jdebeule would you mind doing that, or should someone else produce a new PR?

@jdebeule
Copy link
Author

jdebeule commented Oct 2, 2018

Hi all,

thanks for the interesting comments. Indeed I think initialising these data structures at start up is a good solution. Steve, would you mind sharing the code/files of your experiment?

Best,

Jan

@stevelinton
Copy link
Contributor

Here's the test file I ran. It uses a sieve of Erastothenes to find the primes, and adds the prime powers as it finds each prime. Depending on the exact specification of the arrays, you might need to sort it afterwards, or mark the prime powers for adding when you get to them, or something.

The most obvious speedup is to deal with a few small primes systematically. So rather than using the sieve to mark and then skip over all the multiples of 2, 3 and 5, you could just look a numbers that are congruent to 1, 7, 11, 13,17, 19, 23 and 29 mod 30, for instance.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int degrff[1100000];
int sizeff[1100000];
int charff[1100000];
int numff = 0;
#define MAX_FF (1<<24)
#define SQR_FF (1<<12)
int main() {
    unsigned char *sieve = malloc(MAX_FF);    
    memset(sieve, 0, MAX_FF);
    for (int i = 2; i <= SQR_FF; i++) {
        if (!sieve[i]) {
            int p = i;
            int q = p;
            int d = 1;
            while (q <= MAX_FF) {
                sizeff[numff] = q;
                charff[numff] = p;
                degrff[numff] = d;
                numff++;
                q *= p;
                d++;
            }
            int np = p+p;
            while (np < MAX_FF) {
                sieve[np] = 1;
                np += p;
            }
        }
    }
    for (int i = SQR_FF+1; i < MAX_FF; i++) {
        if (!sieve[i]) {
            sizeff[numff] = i;
            charff[numff] = i;
            degrff[numff] = 1;
            numff++;
        }
    }
    printf("%i\n",numff);
}

Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some remarks.

BTW, 200 ms extra startup time would be very bad from my perspective. But perhaps it can be optimized; or delayed until it is actually needed (but then we'd have to be very careful with e.g. HPC-GAP in mind)

@@ -10,14 +10,16 @@
##
## This file deals with internal finite field elements.
##
## Update September 2018: MAXSIZE_GF_INTERNAL changed to 2^24.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't really belong here, that kind of information is available in the repository.



#############################################################################
##

#V MAXSIZE_GF_INTERNAL . . . . . . . . . . . . maximal size of internal ffes
##
BIND_GLOBAL( "MAXSIZE_GF_INTERNAL", 2^16 );
BIND_GLOBAL( "MAXSIZE_GF_INTERNAL", 2^24 ); #jdebeule 16/09/18 2^24 was 2^16.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the #jdebeule comment

This applies many times again later on, so I'll not repeat this each time.

@@ -48,6 +48,26 @@
** is the logarithm of $z^{a-1} + 1$. This list is usually called the
** Zech-Logarithm table. The zeroth entry in the finite field bag is the
** order of the finite field minus one.
**
** Note of September 2018. A finite field element (FFE) uses 32 bit (cfr supra).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "cfr supra" mean?

** Therefore, currently the largest allowed size for internatlly represented fields
** is set to 2^24. A second hurdle to deal with larger fields is the data stored in ffdata.c.
** The interpolation search requires these lists to be available, but increasing the largest
** possible order, increases the amount of data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick (feel free to ignore): Ideally, kernel code comments should wrap at about 78 columns, to match existing code.

@@ -83,103 +103,14 @@ static Obj TYPE_FFE0;
** 'PolsFF' is a list of Conway polynomials for finite fields. The even
** entries are the proper prime powers, odd entries are the corresponding
** conway polynomials.
*
** Update September 2018: the 6948 Conway polynomials for fields of prime power
** order at most 2^32 are stored in the file finfield_conway.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this comment

@@ -1623,13 +1565,14 @@ Obj FuncZ2 ( Obj self, Obj p, Obj d)
{
ip = INT_INTOBJ(p);
id = INT_INTOBJ(d);
if (ip > 1 && id > 0 && id <= 16 && ip < 65536)
/* jdebeule 19/9/2018 '24' was '16', 65536 became new constant from ffdata.h */
if (ip > 1 && id > 0 && id <= 24 && ip <= SIZE_LARGEST_INTERNAL_FF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 24 should also be made a constant. Say BITS_INTERNAL_FF; then define SIZE_LARGEST_INTERNAL_FF as 1 << BITS_INTERNAL_FF

@@ -157,8 +187,11 @@ extern Obj TypeFF0;
** which will only work if the product of integers of type 'FFV' does not
** cause an overflow. And of course the successor table stored for a finite
** field will become quite large for fields with more than 65536 elements.
**
** Update of September 2018: we fully agree with the comments above. See
** the necessary changes in POW1_FFV, VAL_FFE, NEW_FFE, and FLD_FFE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of agreeing, the remark before should be removed, and of course parts of the comment above also are not valid anymore, as they talk about UInt2 and 65536 elements etc. In general, it'd be good to go through all comments here and make sure they are up-to-date (but of course it's perfectly sensible that you did not do this so far, as it was (is?) not 100% clear this PR gets merged. In other words: I am happy if adjusting the comments is delayed until the functional part of this PR is complete and generally accepted; I merely point this (and other) nitpicks out right now as part of general reviewing.

*/
#define FLD_FFE(ffe) ((FF)((((UInt)(ffe)) & 0xFFFF) >> 3))
/*#define FLD_FFE(ffe) ((FF)((((UInt)(ffe)) & 0xFFFF) >> 3))*/
#define FLD_FFE(ffe) ((FF)((((UInt)(ffe)) & 0xFFFFFFFF) >> 3)) /*jdebeule (16/9/18): see line above for old version */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it, I'd suggest to change this to a static inline function. Moreover, I'd replace 0xFFFF resp. 0xFFFFFFFF by a constant, too, e.g. MASK_INTERNAL_FF = SIZE_LARGEST_INTERNAL_FF-1

Then:

static inline FF FLD_FFE(Obj ffe)
{
    return (FF)(((UInt)ffe) & SIZE_LARGEST_INTERNAL_FF) >> 3);
}

Of course we could also extract the changes turning this (and other) macros into static inline functions, as well as the new constants MASK_INTERNAL_FF, BITS_INTERNAL_FF, into a separate PR, which could be merged quickly, and then this PR just would have to be rebased. I'd be happy to volunteer to create such a PR, too (in order to avoid causing lots of extra work for you). But only if you'd consider that helpful, of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I really think this (or some other PR) should introduce MASK_INTERNAL_FF; the change to static inline, however, could also be made after this PR here is merged.

/*#define NEW_FFE(fld,val) ((Obj)(((UInt)(val) << 16) + \
((UInt)(fld) << 3) + (UInt)0x02))*/
#define NEW_FFE(fld,val) ((Obj)(((UInt)(val) << 32) + \
((UInt)(fld) << 3) + (UInt)0x02)) /* jdebeule (16/9/18): see above for old version */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should turn all the macros in this header file to static inline (and I'd do it right now, but again, I don't want to cause too much work for you by forcing you to rebase this PR, so we can delay this to later).


#include "finfield_conway.h" /* jdebeule 16/09/18 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a script that generated finfield_conway.h? If so, can we please add it to the repository, too? I don't really like having opaque data tables that then can't easily be edited, updated or understood.

In that regard, it is sad how e.g. 63001, 6 +242*251 turned into 63001, 60748. If I had the script, I am sure I could adjust it to produce the latter again...

Of course if you drop this part of the change again (keeping the old table, and generating the rest dynamically), this concern goes away.

@stevelinton
Copy link
Contributor

I'd hate to see this change lost. I think it's a really good idea.

It occurs to me that the main issue we are having comes from the fact that we want to number the finite fields consecutively in a thread-safe way, which, itself, comes from the fact that Martin wanted to do fields up to 2^16 in 32 bit ints and that didn't leave enough bits to just put the field order in the representation of the elements. (16 (field size)+16 (Zech log) +2 tag bits > 32). We on the other hand don't have this problem in a 64 bit world. (24+24+3 < 64). So we can adopt a much simpler solution.

If we put q itself in the immediate FF representation and then have a thread-safe write-only hash table giving the characteristics and degrees (or something of the kind) then we should avoid all our race condition problems without having to precompute big data structures.

@stevelinton
Copy link
Contributor

On reflection, the best solution seems to be to leave the PR more or less exactly as it is, except to
actually construct ffdata.c during the build process. There should be no problem including a small C utility to construct it, and simply building the file in gen when needed. There is scope to save some memory by reducing two big arrays from unsigned long to UInt4 saving half their memory.

@stevelinton
Copy link
Contributor

Turns out we already have etc/ffgen.c. So I think all that needs to be done is to make that a bit more flexible and include it in the build process by default, rather than having it as an optional task that can be performed by developers

@jdebeule
Copy link
Author

Hi all,

thanks a lot so far for the feedback, suggestions and improvements. Past weeks were too busy for me to work on this. I will make time for this again after the Aachen workshop next week.

Jan

@fingolfin
Copy link
Member

@jdebeule note that @stevelinton by now made PR #2997 and you might want to look there, to avoid duplicate work

@fingolfin
Copy link
Member

Closing this in favor of PR #2997

@fingolfin fingolfin closed this Jan 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gapdays2018-fall Issues and PRs that arose at https://www.gapdays.de/gapdays2018-fall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants