Skip to content

Extra pool in Code objects prevents S4 objects from collection #1288

Open
@fikovnik

Description

Consider the following code (simplified version of the R methods package test):

A <- setRefClass("A")
A$methods(f = function(i) { print(i) })

bug <- TRUE

# class B inherits from A
B <- setRefClass("B", contains = "A")
B$methods(
  # destructor that sets the global flag `bug` to FALSE
  finalize = function() { bug <<- FALSE },

  # method that calls `f` from the superclass
  g = function() {
    # R at its best - first pull out the method `f` from super class into the current environment
    usingMethods("f")
    # call it using lapply - the match.fun in lapply will resolve the string to a closure
    lapply(1, "f")
})

# new instance of class B
b <- B()

# call B$g which in turn calls A$f printing 1
b$g()

# removes b
rm(b)

# since b is not referenced from anywhere, it should be collected
# and have its finalizer called
gc()

# check that the finalized has run
stopifnot(!bug)

In vanilla R it works well.
On master it works well as long as the number of type feedback slots for the observed callees is less than 13.

Here is what happens:

  • The lapply function gets compiled to RIR.
  • It contains a call to .Internal(lapply, xs, FUN) which we special-case (inline) in the RIR compiler turning it into a loop with one call to FUN(xs[[i]]). This call will create a type feedback slot for the observed callees.
  • Once we call it from the B$g the interpreter will record the callee, i.e. the closure A$f into the type feedback.
  • The closure is S4 method, in its environment, it contains reference to .self which in turn will prevent it from collection.

The way the observed callees recording works is that first the closure gets stored in the extra pool of the corresponding caller code object, i.e. the 'lapply' function, and then the index of this pool entry gets stored in the type feedback.
Normally, we keep just 3 callees per call site. Since lapply is used for a whole lot of things before we exhaust the available slots way before we call the b$g(). However, when we increase the number of slots, the bug manifests. It even manifests when PIR is disabled (PIR_ENABLE=off)

This happened when @stepam38 added the support for the contextualized type feedback.

IMHO: It is a fairly weird test: relying on a call to gc() to collect an object is wrong. Having finalized together with gc also wrong.

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions