Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in testextra/grpauto.tst (memory usage) #2377

Closed
olexandr-konovalov opened this issue Apr 19, 2018 · 28 comments
Closed

Regression in testextra/grpauto.tst (memory usage) #2377

olexandr-konovalov opened this issue Apr 19, 2018 · 28 comments
Labels
topic: packages issues or PRs related to package handling, or specific to a package (for packages w/o issue tracker)
Milestone

Comments

@olexandr-konovalov
Copy link
Member

The following is observed in teststandard tests on Jenkins:

########> Diff in /circa/scratch/gap-jenkins/workspace/GAP-master-test/GAPCOPT\
S/64build/GAPTARGET/standard/label/kovacs/GAP-master-snapshot/tst/testextra/gr\
pauto.tst:105
# Input is:
IsomorphismGroups(G,PcGroupCode(CodePcGroup(G),Size(G)))=fail;
# Expected output:
false
# But found:
Error, reached the pre-set memory limit
(change it with the -o command line option)
########

In the master branch this appens on both macos and linux jenkins slaves. In the stable-4.9 branch - only on macos. I've tried to run just that example starting GAP with -o 1g option and so war was able to reproduce this problem only once. Apparently, there is some interaction between other tests and loaded packages, and balancing around the 1g limit with some randomness and garbage collections making this hard to reproduce.

This is the input:

LoadAllPackages();
G:=PcGroupCode( 741231213963541373679312045151639276850536621925972119311,11664);;
H:=PcGroupCode( 888658311993669104086576972570546890038187728096037768975,11664);;
IsomorphismGroups(G,H);
IsomorphismGroups(G,PcGroupCode(CodePcGroup(G),Size(G)))=fail;

and this is the memory usage that I observe under macos:

Load GAP                131           131 
Load all packages       ---           344
After 1st iso test      142           436
After 2nd iso test      548           1.01g

(the last 1.01g value is measured after a break loop happened and I've entered "return" to resume the calculation).

The problem can be reproduced easier if GAP is started with just slightly smaller memory limit -o 1000m. Then after Error, reached the pre-set memory limit I see this:

brk> Where(50);
ImagesRepresentative( map, gen ) at /Users/alexk/GITREPS/gap-stable/lib/mapphomo.gi:311 called from
func( C[i] ) at /Users/alexk/GITREPS/gap-stable/lib/coll.gi:746 called from
List( GeneratorsOfMagmaWithInverses( elms ), function ( gen )
      return ImagesRepresentative( map, gen );
  end ) at /Users/alexk/GITREPS/gap-stable/lib/mapphomo.gi:311 called from
ImagesSet( map, elm ) at /Users/alexk/GITREPS/gap-stable/lib/mapping.gi:185 called from
Image( hom, sub ) at /Users/alexk/GITREPS/gap-stable/lib/autsr.gi:811 called from
act( orb[p], acts[i] ) at /Users/alexk/GITREPS/gap-stable/lib/oprt.gi:1241 called from
OrbitStabilizerAlgorithm( G, false, false, gens, acts, rec(
    pnt := d,
    act := act,
    onlystab := true ) ) at /Users/alexk/GITREPS/gap-stable/lib/oprt.gi:2909 called from
orbish( G, pnt, gens, acts, act ) at /Users/alexk/GITREPS/gap-stable/lib/oprt.gd:866 called from
CallFuncList( StabilizerFunc, arg ) at /Users/alexk/GITREPS/gap-stable/pkg/rcwa-4.6.1/lib/rcwagrp.gi:4586 called from
Stabilizer( api, v, gens, pre, asAutomorphism ) at /Users/alexk/GITREPS/gap-stable/lib/autsr.gi:953 called from
PatheticIsomorphism( G, H ) at /Users/alexk/GITREPS/gap-stable/lib/morpheus.gi:2101 called from
<function "IsomorphismGroups">( <arguments> )
 called from read-eval loop at *errin*:1
brk> arg[1];
[ f4, f3*f4, f5, f6, f8^2*f10^2*f11, f8*f9^2*f12*f13*f14*f19^2, f13*f14, f13, f16, f15*f16, f8^2*f10*f17*f19^2*f20^2, f18, 
  f19, f20 ]
brk> arg[2];
function( gen ) ... end
brk> Print(last);
function ( gen )
    return ImagesRepresentative( map, gen );
end
brk>
@olexandr-konovalov
Copy link
Member Author

olexandr-konovalov commented Apr 19, 2018

@Stefan-Kohl I can trace this to the following lines from rcwa-4.6.1/lib/rcwagrp.gi:4586:

#############################################################################
##
#F  Stabilizer( <arg> ) . . . . . . . . . .  use StabilizerOp for rcwa groups
##
MakeReadWriteGlobal( "Stabilizer" ); Unbind( Stabilizer ); # dirty hack ...
BindGlobal( "Stabilizer",                                  #

  function ( arg )
    if   Length( arg ) = 1   then return StabilizerOfExternalSet( arg[1] );
    elif IsRcwaGroup(arg[1]) then return CallFuncList( StabilizerOp, arg );
    else return CallFuncList( StabilizerFunc, arg ); fi;
    return;
  end );

Will you be able to replace this "dirty hack" as the comment says by some proper implementation urgently for GAP 4.9.1 release, please?

@fingolfin
Copy link
Member

Woah, not so fast. This is the original Stabilizer function from the library:

InstallGlobalFunction( Stabilizer, function( arg )
    if Length( arg ) = 1  then
        return StabilizerOfExternalSet( arg[ 1 ] );
    else
        return CallFuncList( StabilizerFunc, arg );
    fi;
end );

So while the above hack is, well, a hack, I don't quite see how it would cause the problem at hand, given that we correctly branch into StabilizerFunc

@olexandr-konovalov
Copy link
Member Author

Yes, just came to the same conclusion looking at the same code of Stabilizer.

@olexandr-konovalov
Copy link
Member Author

olexandr-konovalov commented Apr 19, 2018

Looking now at some packages that appear in the results of profiling and are not loaded by default. That's what I've got so far:

packages Load GAP load packages 1st iso test (mem) 2nd iso test (mem) 1st test (time) 2nd test (time)
default 131 --- 142 548 38s 150s
all pkg 131 344 436 1.01g
FR 131 131 169 596
semigroups 131 131 193 720 36s 256s
RCWA 131 131 198 593
Digraphs 131 131 145 535 37s 150s
genss 131 131 143 558 37s 151s
GRAPE 131 131 142 538 39s 162s

@fingolfin
Copy link
Member

I wonder if some package(s) add immediate functions for some attributes which then leads to both time and memory usage increases

@olexandr-konovalov
Copy link
Member Author

@james-d-mitchell I think I've told you earlier that I have reasons to suggest that semigroups is involved with this, but at that time I have no convincing arguments. It seems like I have them now - none of the packages that semigroups requires increases the memory usage as much as semigroups. On the other hand, there is no trading memory for performance - the runtime with semigroups package loaded is more than a minute longer than without semigroups.

These are the files that show non-zero coverage:

  • semigroups-3.0.15/gap/main/acting.gi
  • semigroups-3.0.15/gap/main/lambda-rho.gi
  • semigroups-3.0.15/gap/main/orbits.gi
  • semigroups-3.0.15/gap/main/setup.gi
  • semigroups-3.0.15/gap/options.g
  • semigroups-3.0.15/gap/semigroups/semicons.gi
  • semigroups-3.0.15/gap/semigroups/semimaxplus.gi

@olexandr-konovalov olexandr-konovalov added this to the GAP 4.9.1 milestone Apr 19, 2018
@olexandr-konovalov olexandr-konovalov added the topic: packages issues or PRs related to package handling, or specific to a package (for packages w/o issue tracker) label Apr 19, 2018
@olexandr-konovalov olexandr-konovalov changed the title RCWA and regression in testextra/grpauto.tst (memory usage) Regression in testextra/grpauto.tst (memory usage) Apr 19, 2018
@hulpke
Copy link
Contributor

hulpke commented Apr 19, 2018

@alex-konovalov
Is this a test with any assertions turned on? If so, does it also happen if assertions are turned off?

Best,

Alexander

@olexandr-konovalov
Copy link
Member Author

@hulpke all timings and memory usage in the table above, as well as the break loop backtrace are reproducible with assertions turned off.

@hulpke
Copy link
Contributor

hulpke commented Apr 19, 2018

@alex-konovalov
Indeed, thanks. I just tested in my version and loading semigroups increases runtime by a factor two. Puzzling...

@olexandr-konovalov
Copy link
Member Author

@hulpke this is the case when groups are isomorphic. Maybe it takes longer time to find an isomorphism because of different random states/ heuristics / ordering of things etc.

@hulpke
Copy link
Contributor

hulpke commented Apr 20, 2018

I've now tried a few runs to check impact of immediate methods. (Yes, I know that I am the curmudgeon who has always whined against the concept of immediate methods, so take the following with a grain of salt if you want so.)

After an initial run to ensure memory has been allocated, I get for the Isomorphism calculation in the master branch (no assertions turned on)

plain: 170s
Immediate methods turned off: 80s
after loading semigroups, immediate methods on: 290s
semigroups, immediate methods off: 265s

So there are two issues: Immediate methods in the main library, and ordinary methods in semigroups.

While it is entirely possible that my code is badly written, Is that what we want immediate methods to do?

Would it make sense to also have a test run with immediate methods turned off to spot such performance hits?

(Caveat: I have a patch that speeds the whole calculation up far more, so one could have this issue go away. )

Testing with TraceImmediateMethods brings up the following immediate methods
#I immediate: IsCommutative
#I immediate: IsCyclic
#I immediate: IsFinitelyGeneratedGroup
#I immediate: IsEmpty
#I immediate: IsTrivial
#I immediate: IsNonTrivial
#I immediate: IsPerfectGroup
#I immediate: IsSolvableGroup

Some of these are old, some of them have been added just recently. Some likely could be replaces by TrueMethods, some by simply setting certain flags when creating groups. (None strikes me as particularly useful for generic calculations, a few actually look a bit ``show-off'')

While they are short, part of the cost is the iterated type change, as a few basic properties are deduced one-by-one.

#I immediate: Size

would set Size to infinity once IsFinite is set to false. Is this really so valuable?

The semigroups package seems to be involved only with
#I immediate: IsSemigroupIdeal

Should we make these immediate methods another issue?

@ChrisJefferson
Copy link
Contributor

I think we should make it an issue. I knew immediate methods could slow things down, but I didn't guess it would be a 2x slowdown.

@olexandr-konovalov
Copy link
Member Author

@hulpke

(Caveat: I have a patch that speeds the whole calculation up far more, so one could have this issue go away. )

Thanks - not sure I exactly understand what do you mean by "this issue" - #2377 or just the performance regression. Just in case to point out that to resolve #2377 one needs to fix running out of memory limit.

@hulpke
Copy link
Contributor

hulpke commented Apr 22, 2018

@alex-konovalov
It resolves time and memory limit (but does not revolve the issues of immediate methods and the semigroups package, so I hold off with it). The changes are in #2383 if you want to see them.
This improvement has nothing to do with the issues observed here, but is simply using the fact that the automorphism group is in fact solvable.

@fingolfin
Copy link
Member

I agree that we should look into (a) improving existing immediate methods, if poisble (i.e. make them faster, (b) reducing the number of them, (c) making the code which handles immediate methods faster.

I have several ideas for that. E.g. for immediate methods on properties, in most (all) cases in the library, InstallTrueMethod is not quite enough right now. However, this can be changed by implemented extended true implications, and also InstallFalseMethod, as discussed in issue #235. I had started work on that recently for other reasons, but it might be beneficial here, too. (Of course it should be carefully benchmarked, to see if it actually helps, and does not hurt existing cases).

And yeah, we should measure performance regressions more closely (see also issues #260 and #636). In fact I know that @alex-konovalov used to do that, but it's unfortunate not quite easy to setup and maintain (and we cannot use Travis for that, as its runtimes fluctuate very heavily; we need to use dedicated hardware; which exists, in St Andres, but AFAIK the results are currently not visibly unless you can VPN to the St Andrews network... We discussed ways to change this in the past, but so far nothing happened (as usual, it's a matter of having the time to work on in it). I dream of having something like http://speed.pypy.org for GAP, but right now we are pretty far away from that.

@fingolfin
Copy link
Member

Some more notes: If I load Semigroups, then many more if its immediate methods beyond IsSemigroupIdeal are triggered in my tests.

I think in the tests, it is also instructive to see what happens for gap -A: at least for me, the following test requires 288 seconds with -A, but only 135 seconds with the default set of packages.

G:=PcGroupCode( 741231213963541373679312045151639276850536621925972119311,11664);;
IsomorphismGroups(G,PcGroupCode(CodePcGroup(G),Size(G)))=fail; time;

@fingolfin
Copy link
Member

fingolfin commented Apr 23, 2018

@alex-konovalov you call this a regression, but I don't understand why. This test was not present in GAP 4.8, so it cannot have regressed "as a test". The only thing that regressed in this sense is the overall testsuite, in the sense that it now may not pass resp. take too long on some configurations.

But the examples given here themselves do not seem to have regressed. To the contrary: In some quick tests I just run, performance seems to be far worse for them in GAP 4.8, worse than the worst I've seen in master resp. stable-4.9. And with that I mean that instead of running for 2-4 minutes on my test machine, they run longer than 15 minutes at least (they have not yet completed).

UPDATE: one of two computations completed, after 9818864 msec, or 2 hours and 43 minutes. The other (started with gap -A) is still running. So, I don't think we have a regression here compared to GAP 4.9 or GAP 4.8.

@james-d-mitchell
Copy link
Contributor

I'm slightly puzzled by your comment @fingolfin that "many more of its immediate methods beyond IsSemigroupIdeal are triggered in my tests". In Semigroups there are only 16 immediate methods installed, and only 5 of them could possibly triggered by objects not created within the Semigroups package (i.e. unless your tests use some objects defined in the Semigroups package, then I don't see how any of the other methods could be triggered).

The immediate method for IsSemigroupIdeal is just the following:

InstallImmediateMethod(IsSemigroupIdeal, IsSemigroup, 0, IsMagmaIdeal);

the many more other 4 are:

InstallImmediateMethod(IsTrivial, IsMonoid and HasGeneratorsOfMonoid,
InstallImmediateMethod(IsTrivial, IsMonoid and HasGeneratorsOfSemigroup,
InstallImmediateMethod(IsMonogenicSemigroup,
InstallImmediateMethod(IsMonogenicMonoid,

If it is somehow desirable to remove/improve these methods, then please file an issue on the Semigroups package issue tracker, and I'll be happy to resolve them.

@fingolfin
Copy link
Member

@james-d-mitchell My apologies, I misread a swarming output of TraceImmediateMethods in a longer session. Let me try to clarify: Right now, all I see are two immediates from semigroups IsSemigroupIdeal and IndecomposableElements (see below). Also note: This is not a about "blaming" semigroups, or asking you for changes, but rather we are still trying to figure out what's going on; and it's quite noticeable that having Semigroups loaded has a major impact on speed and memory usage.

Here is an example log:

gap> LoadPackage("semigroups",false);
true
gap> TraceImmediateMethods();
gap> G:=Group( (1,2), (1,2,3) );;
#I RunImmediateMethods
#I  immediate: Size at /Users/mhorn/Projekte/GAP/gap.github/lib/coll.gi:174
#I  immediate: IsCyclic at /Users/mhorn/Projekte/GAP/gap.github/lib/grp.gi:34
#I  immediate: IsCommutative at /Users/mhorn/Projekte/GAP/gap.github/lib/magma.gi:190
#I  immediate: IsTrivial at /Users/mhorn/Projekte/GAP/gap.github/lib/magma.gi:124
#I  immediate: IsSemigroupIdeal via ((IsMagma and IsLeftActedOnBySuperset) and (IsMagma and IsRightActedOnBySuperset))
#I  immediate: IndecomposableElements at /Users/mhorn/Projekte/GAP/repos/pkg/pkg/Semigroups/gap/attributes/attr.gi:948
gap> G:=PcGroupCode( 741231213963541373679312045151639276850536621925972119311,11664);;
#I RunImmediateMethods
#I  immediate: IsCyclic at GAPROOT/lib/grp.gi:34
#I  immediate: IsCommutative at GAPROOT/lib/magma.gi:190
#I  immediate: IsTrivial at GAPROOT/lib/magma.gi:124
#I  immediate: IsSemigroupIdeal via ((IsMagma and IsLeftActedOnBySuperset) and (IsMagma and IsRightActedOnBySuperset))
#I  immediate: IndecomposableElements at GAPROOT/pkg/Semigroups/gap/attributes/attr.gi:948
#I RunImmediateMethods
#I  immediate: IsNonTrivial at GAPROOT/lib/coll.gi:134
#I RunImmediateMethods
#I RunImmediateMethods
#I RunImmediateMethods
#I  immediate: IsSemigroupIdeal via ((IsMagma and IsLeftActedOnBySuperset) and (IsMagma and IsRightActedOnBySuperset))
#I  immediate: IndecomposableElements at GAPROOT/pkg/Semigroups/gap/attributes/attr.gi:948
#I RunImmediateMethods
#I  immediate: IsCyclic at GAPROOT/lib/grp.gi:34
#I  immediate: IsFinitelyGeneratedGroup at GAPROOT/lib/grp.gi:22
#I  immediate: IsCommutative at GAPROOT/lib/magma.gi:190
#I  immediate: IsTrivial at GAPROOT/lib/magma.gi:124
#I RunImmediateMethods
#I RunImmediateMethods
#I RunImmediateMethods
#I  immediate: Size at GAPROOT/lib/coll.gi:174
#I  immediate: IsCyclic at GAPROOT/lib/grp.gi:34
#I  immediate: IsCommutative at GAPROOT/lib/magma.gi:190
#I  immediate: IsTrivial at GAPROOT/lib/magma.gi:124
#I  immediate: IsSemigroupIdeal via ((IsMagma and IsLeftActedOnBySuperset) and (IsMagma and IsRightActedOnBySuperset))
#I  immediate: IndecomposableElements at GAPROOT/pkg/Semigroups/gap/attributes/attr.gi:948
#I RunImmediateMethods
#I  immediate: IsTrivial at GAPROOT/lib/coll.gi:116
#I  immediate: IsEmpty at GAPROOT/lib/coll.gi:97
#I  immediate: IsPerfectGroup via IsTrivial
#I  immediate: IsNonTrivial at GAPROOT/lib/coll.gi:134

I think a major culprit here is IndecomposableElements: it adds to every single group object an attribute, set to []. That adds up quickly: Each empty list takes up 24 bytes of heap storage, pus one master pointer; then the attribute has to be stored, which costs extra storage. So for example, MemoryUsage(Group( (1,2), (1,2,3) )); is 178 without semigroups, and 228 with it. That's "just" 50 bytes (in other examples, I see 64, it depends on which attributes are set otherwise). But it adds up, both in time and memory usage, if you compute with millions of groups, as is the case here.

That said, there must be other factors at work here; as I said, we are still exploring.

@james-d-mitchell
Copy link
Contributor

@fingolfin Thanks for clearing that up. Just to note further that IndecomposableElements does not yet appear in any released version of Semigroups (it's in the master branch for inclusion in version 3.1.0). I'm not completely sure why there is an immediate method for this, perhaps @wilfwilson (as the author of the method) can comment further?

@wilfwilson
Copy link
Member

I do not remember why I made the immediate method. I'll make a PR to Semigroups for removing it.

@olexandr-konovalov
Copy link
Member Author

@fingolfin I call it a regression because make teststandard used to pass in all configurations in stable-4.9 branch (i.e. for GAP 4.9.1 release candidate) and does not pass any more.

@fingolfin
Copy link
Member

@alex-konovalov if that's how you define it, then moving this new test from teststandard to testextra (or deleting it entirely) is the appropriate fix for stable-4.9.

@olexandr-konovalov
Copy link
Member Author

@fingolfin agree without hesitations - although not to testextra (those tests are performed as part of make teststandard) but e.g. to benchmark directory.

@olexandr-konovalov
Copy link
Member Author

P.S. @fingolfin, should the PR for removing this example from teststandard go into master or stable-4.9?

@james-d-mitchell
Copy link
Contributor

Just to mention that the immediate method for IndecomposableElements is now removed from the Semigroups package master branch, if any of the other immediate methods are causing any issues, then please let us know.

olexandr-konovalov pushed a commit to olexandr-konovalov/gap that referenced this issue Apr 25, 2018
This will allow to avoid test failure reported in gap-system#2377 in
stable-4.9 branch. The fix in the master brancg will be done
differenty. This test is duplicated in benchmarks, so we will
monitor its status there.
olexandr-konovalov pushed a commit to olexandr-konovalov/gap that referenced this issue Apr 25, 2018
This will allow to avoid test failure reported in gap-system#2377 in
stable-4.9 branch. The fix in the master branch will be done
differently. This test is duplicated in benchmarks, so we will
monitor its status there.
fingolfin pushed a commit that referenced this issue Apr 26, 2018
This will allow to avoid test failure reported in #2377 in
stable-4.9 branch. The fix in the master branch will be done
differently. This test is duplicated in benchmarks, so we will
monitor its status there.
@olexandr-konovalov
Copy link
Member Author

This now happens only in the master branch. In stable-4.9 the test causing the problem has been removed in #2407.

@fingolfin
Copy link
Member

This should be fixed now, due to PR #2383 and #2387 (resp. #2522)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: packages issues or PRs related to package handling, or specific to a package (for packages w/o issue tracker)
Projects
None yet
Development

No branches or pull requests

6 participants