-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FLOAT_LITERAL_CACHE is dangerous and a source of leaks #1993
Comments
To clarify: not exposing |
It's not really a surprise that you can break GAP by modifying all-caps data structures. I imagine the method selection data offers lots of ways to do it too. Doesn't mean that fixing them isn't worth doing, but I'm not sure it's a priority. To my mind, the memory leak is a worse problem, but every function body corresponds to a function()...end in some code that was parsed, so it's hard to visualize a program that would GC really huge numbers of function bodies. Maybe one using ReadAsFunction really heavily. On the other hand having the float cache in the function body is probably nicer overall. For the lazy case, would it be easier, and perhaps more useful anyway to maintain somewhere a list of all T_BODY objects in the workspace (typically a few 10s of thousands) and use a custom free function to remove dead body bags from it. That might be useful for other purposes. |
When coding GAP function bodies, we now have an (optional) plist in the function body of (immutable) values. Instead of storing the raw data for an integer or string within the respective expression, we instead store the integer resp. string in that values plist, and retrieve it from there when printing or evaluation such expressions (for strings, evaluation returns a shallow copy of the stored string constant, to preserve semantics). For eager floats, the values plist is used instead of `EAGER_FLOAT_LITERAL_CACHE`, which is removed by this patch (thus resolving part of issue gap-system#1993). These changes have several advantages: * we avoid a possible GC crash when compiling (large) string expressions * a function like `f := x -> 12345678901234567890` used to return a new integer object with each call; now it always returns the same integer, i.e., `IsIdenticalObj(f(1), f(2));` now returns true * floats benefit from avoiding use of a global "cache" which never is emptied (thus these eager float literals now are garbage collected if the function referencing them is garbage collected) * the cache simplifies future optimizations: e.g. we could convert expressions like `-2^100` by evaluating them and storing the result in the values list; we could also store permutations precomputed, that is, replace a `T_PERM_EXPR` and the `T_PERM_CYCLE` expressions nested in it by a single "constant expression" that references the immutable permutation described by these expressions (at least in the case these expressions were explicitly defined, i.e., `(1,2)` would work but not `(i,j)`
When coding GAP function bodies, we now have an (optional) plist in the function body of (immutable) values. Instead of storing the raw data for an integer or string within the respective expression, we instead store the integer resp. string in that values plist, and retrieve it from there when printing or evaluation such expressions (for strings, evaluation returns a shallow copy of the stored string constant, to preserve semantics). For eager floats, the values plist is used instead of `EAGER_FLOAT_LITERAL_CACHE`, which is removed by this patch (thus resolving part of issue gap-system#1993). These changes have several advantages: * we avoid a possible GC crash when compiling (large) string expressions * a function like `f := x -> 12345678901234567890` used to return a new integer object with each call; now it always returns the same integer, i.e., `IsIdenticalObj(f(1), f(2));` now returns true * floats benefit from avoiding use of a global "cache" which never is emptied (thus these eager float literals now are garbage collected if the function referencing them is garbage collected) * the cache simplifies future optimizations: e.g. we could convert expressions like `-2^100` by evaluating them and storing the result in the values list; we could also store permutations precomputed, that is, replace a `T_PERM_EXPR` and the `T_PERM_CYCLE` expressions nested in it by a single "constant expression" that references the immutable permutation described by these expressions (at least in the case these expressions were explicitly defined, i.e., `(1,2)` would work but not `(i,j)`
When coding GAP function bodies, we now have an (optional) plist in the function body of (immutable) values. Instead of storing the raw data for an integer or string within the respective expression, we instead store the integer resp. string in that values plist, and retrieve it from there when printing or evaluation such expressions (for strings, evaluation returns a shallow copy of the stored string constant, to preserve semantics). For eager floats, the values plist is used instead of `EAGER_FLOAT_LITERAL_CACHE`, which is removed by this patch (thus resolving part of issue gap-system#1993). These changes have several advantages: * a function like `f := x -> 12345678901234567890` used to return a new integer object with each call; now it always returns the same integer, i.e., `IsIdenticalObj(f(1), f(2));` now returns true * floats benefit from avoiding use of a global "cache" which never is emptied (thus these eager float literals now are garbage collected if the function referencing them is garbage collected) * the cache simplifies future optimizations: e.g. we could convert expressions like `-2^100` by evaluating them and storing the result in the values list; we could also store permutations precomputed, that is, replace a `T_PERM_EXPR` and the `T_PERM_CYCLE` expressions nested in it by a single "constant expression" that references the immutable permutation described by these expressions (at least in the case these expressions were explicitly defined, i.e., `(1,2)` would work but not `(i,j)`
When coding GAP function bodies, we now have an (optional) plist in the function body of (immutable) values. Instead of storing the raw data for an integer or string within the respective expression, we instead store the integer resp. string in that values plist, and retrieve it from there when printing or evaluation such expressions (for strings, evaluation returns a shallow copy of the stored string constant, to preserve semantics). For eager floats, the values plist is used instead of `EAGER_FLOAT_LITERAL_CACHE`, which is removed by this patch (thus resolving part of issue gap-system#1993). These changes have several advantages: * a function like `f := x -> 12345678901234567890` used to return a new integer object with each call; now it always returns the same integer, i.e., `IsIdenticalObj(f(1), f(2));` now returns true * floats benefit from avoiding use of a global "cache" which never is emptied (thus these eager float literals now are garbage collected if the function referencing them is garbage collected) * the cache simplifies future optimizations: e.g. we could convert expressions like `-2^100` by evaluating them and storing the result in the values list; we could also store permutations precomputed, that is, replace a `T_PERM_EXPR` and the `T_PERM_CYCLE` expressions nested in it by a single "constant expression" that references the immutable permutation described by these expressions (at least in the case these expressions were explicitly defined, i.e., `(1,2)` would work but not `(i,j)`
When coding GAP function bodies, we now have an (optional) plist in the function body of (immutable) values. Instead of storing the raw data for an integer or string within the respective expression, we instead store the integer resp. string in that values plist, and retrieve it from there when printing or evaluation such expressions (for strings, evaluation returns a shallow copy of the stored string constant, to preserve semantics). For eager floats, the values plist is used instead of `EAGER_FLOAT_LITERAL_CACHE`, which is removed by this patch (thus resolving part of issue #1993). These changes have several advantages: * a function like `f := x -> 12345678901234567890` used to return a new integer object with each call; now it always returns the same integer, i.e., `IsIdenticalObj(f(1), f(2));` now returns true * floats benefit from avoiding use of a global "cache" which never is emptied (thus these eager float literals now are garbage collected if the function referencing them is garbage collected) * the cache simplifies future optimizations: e.g. we could convert expressions like `-2^100` by evaluating them and storing the result in the values list; we could also store permutations precomputed, that is, replace a `T_PERM_EXPR` and the `T_PERM_CYCLE` expressions nested in it by a single "constant expression" that references the immutable permutation described by these expressions (at least in the case these expressions were explicitly defined, i.e., `(1,2)` would work but not `(i,j)`
As an update: |
When coding GAP function bodies, we now have an (optional) plist in the function body of (immutable) values. Instead of storing the raw data for an integer or string within the respective expression, we instead store the integer resp. string in that values plist, and retrieve it from there when printing or evaluation such expressions (for strings, evaluation returns a shallow copy of the stored string constant, to preserve semantics). For eager floats, the values plist is used instead of `EAGER_FLOAT_LITERAL_CACHE`, which is removed by this patch (thus resolving part of issue gap-system#1993). These changes have several advantages: * a function like `f := x -> 12345678901234567890` used to return a new integer object with each call; now it always returns the same integer, i.e., `IsIdenticalObj(f(1), f(2));` now returns true * floats benefit from avoiding use of a global "cache" which never is emptied (thus these eager float literals now are garbage collected if the function referencing them is garbage collected) * the cache simplifies future optimizations: e.g. we could convert expressions like `-2^100` by evaluating them and storing the result in the values list; we could also store permutations precomputed, that is, replace a `T_PERM_EXPR` and the `T_PERM_CYCLE` expressions nested in it by a single "constant expression" that references the immutable permutation described by these expressions (at least in the case these expressions were explicitly defined, i.e., `(1,2)` would work but not `(i,j)`
When coding GAP functions that contain float constants, these constants are coded either as "lazy" or "eager".
Eager constants are using a specific type of floats (e.g. the built-in machine floats). These are coded by evaluating the constant into float object, which is stored in
EAGER_FLOAT_LITERAL_CACHE
, and then the index where it was stored is coded into the function. Example:Lazy constants are re-evaluated if the user sets a different default floats type using
SetFloats
. These, too, are assigned a unique global index; when the float constant expression is evaluated, the result is cached inFLOAT_LITERAL_CACHE
; this cache is emptied each timeSetFloats
is called. Example:There are several problems with this: First off, exposing these caches allows for crashes and inconsistent computations. While exposing them may have been helpful when developing this feature, it provides a major risk. E.g. we can trivially crash GAP:
Moreover, in HPC-GAP,
EAGER_FLOAT_LITERAL_CACHE
is an atomic list and not immutable, so wonderful things like this are possible:The same is possible with
FLOAT_LITERAL_CACHE
even in regular GAP, although there we could at least in principle mark it as immutable.Another problem is that GAP functions are objects like any other, and hence can be garbage collected. But they contain no reference to the lazy or eager floats they contain. Hence the need for us to use the caches. But that also means that if a function is GCed, then any float constant it referenced will stay around for ever. I.e. we have a leak.
The first part of this issue can be resolved by not exposing these caches to the user. That's rather easy to do. To solve the second issue, here are some thoughts:
For the eager float constants, we could do the following: first, add a new entry to
struct BodyHeader
, sayObj floatcache
. Then when coding a body (T_BODY
), any eager floats get assigned to that cache. Drawback is that this causes some overhead for every function which has a body, even if they don't use floats.For the lazy ones, we need a way to store them in a way that still makes them flushable. So perhaps
FLOAT_LITERAL_CACHE
could become an (atomic) record (which we abuse as a "sparse list" here). Then we add a custom free function forT_BODY
, which iterates over the syntax tree stored in theT_BODY
, and for each float cache indexi
in it, it unbindsFLOAT_LITERAL_CACHE.(i)
.(This is somewhat related to issue #1889, I guess). Or, to have it a bit simpler, we could also store the indices for the lazy constants separately, perhaps even in the same
floatcache
. I.e. eachfloatcache
entry will either be an eagerly evaluated floatean, or else a a positive immediate integer which is an "index" intoFLOAT_LITERAL_CACHE
.The text was updated successfully, but these errors were encountered: