Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orphaned objects are not fully GCed: one refcount key is left in the DB #3378

Closed
warner opened this issue Jun 21, 2021 · 3 comments · Fixed by #8694
Closed

orphaned objects are not fully GCed: one refcount key is left in the DB #3378

warner opened this issue Jun 21, 2021 · 3 comments · Fixed by #8694
Assignees
Labels
bug Something isn't working SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Jun 21, 2021

Describe the bug

#3377 was about a kernel crash that occurred when an orphaned object was processed by processRefcounts(). The fix avoids the crash, but fails to clean up after the object fully. It leaves the ${kref}.refCount DB key in place. Normally that would be deleted when a retireExport GC action was sent to the exporter, but of course orphaned objects have no exporter. By not pushing a retireExport GC action for the late owner vat, we also fail to delete the .refCount key.

I'm not sure of the cleanest solution yet. Maybe have a special kind of "no target vat" retireExport GC action, which deletes the key but doesn't tell any vat about it? Maybe processRefcounts() should notice 1: recognizable=0 and 2: ownerVatID === undefined and delete the key itself, instead of pushing a GC action (but would the DB key change appear in the right commit window?).

The consequence is that we leave one DB key (${kref}.refCount = 0,0) around for each orphaned export of any dead vat. We don't kill vats very frequently yet (well, ok, not intentionally), so this is low-priority.

@warner warner added bug Something isn't working SwingSet package: SwingSet labels Jun 21, 2021
@warner warner self-assigned this Jun 21, 2021
warner added a commit that referenced this issue Jun 21, 2021
There were two bugs in the kernel's garbage-collection handling of orphaned
objects (exports of a vat which is later terminated).

* Sending a message to an orphaned vat caused the object's refcount to be
incremented, but never decremented again, preventing it from being collected.
The root cause was `kernelKeeper.kernelObjectExists` using `.owner` to decide
whether the object still exists. Orphaned objects have a `.refcount` but no
`.owner`. The fix is to  just test `.refcount` instead of `.owner`. #3376
* `kernelKeeper.processRefcounts()` wants to check the `isReachable` flag of
the exporting vat, to know whether they need a dropExport message, but
orphaned objects have no exporting vat anymore. The check crashed the kernel
when it attempted to get a vatKeeper for `undefined`. The immediate fix is to
skip this check for orphaned objects, which avoids the crash (#3377). But we
still need a fix to delete the refcount=0,0 entry (#3378).

closes #3376
closes #3377
refs #3378
warner added a commit that referenced this issue Jun 21, 2021
There were two bugs in the kernel's garbage-collection handling of orphaned
objects (exports of a vat which is later terminated).

* Sending a message to an orphaned vat caused the object's refcount to be
incremented, but never decremented again, preventing it from being collected.
The root cause was `kernelKeeper.kernelObjectExists` using `.owner` to decide
whether the object still exists. Orphaned objects have a `.refcount` but no
`.owner`. The fix is to  just test `.refcount` instead of `.owner`. #3376
* `kernelKeeper.processRefcounts()` wants to check the `isReachable` flag of
the exporting vat, to know whether they need a dropExport message, but
orphaned objects have no exporting vat anymore. The check crashed the kernel
when it attempted to get a vatKeeper for `undefined`. The immediate fix is to
skip this check for orphaned objects, which avoids the crash (#3377). But we
still need a fix to delete the refcount=0,0 entry (#3378).

closes #3376
closes #3377
refs #3378
@warner
Copy link
Member Author

warner commented Jun 21, 2021

I'm leaning towards the latter fix (have processRefcounts() do the deleting immediately.

@warner
Copy link
Member Author

warner commented Jul 23, 2021

Might be associated with (or fixed at the same time as) #3439

@warner
Copy link
Member Author

warner commented Dec 23, 2023

I was working on test-gc-kernel.js > terminated vat today, and noticed this clause at the bottom:

// TODO: we still fail to clean up the [0,0] kref: #3378 is about finding
// somewhere to delete the .refcount key
// t.is(refcounts[doomedExport2Kref], undefined);

which is commented out because of this bug. I uncommented that part, and it passes. Looking at the code more closely, I think it's been fixed:

} else if (recognizable === 0) {
// unreachable, unrecognizable, orphaned: delete the
// empty refcount here, since we can't send a GC
// action without an ownerVatID
deleteKernelObject(kref);

That code is run when processRefcounts looks at an object with no owner, reachable === 0, and recognizable === 0, and deletes it. We added this code as part of PR #4957, to fix #4951 (add syscall.abandonExport, for liveslots to call during stopVat() back when we did that pre-upgrade), about a year after this ticket was created.

I'll make a PR that uncomments the test clause.

warner added a commit that referenced this issue Dec 23, 2023
This clause was commented out because bug #3378 was still pending at
the time. I think we happened to fix that as a side-effect of adding
`syscall.abandonExport` for #4951 (in PR #4957), but didn't realize we
could re-enable the clause.

closes #3378
warner added a commit that referenced this issue Dec 23, 2023
This clause was commented out because bug #3378 was still pending at
the time. I think we happened to fix that as a side-effect of adding
`syscall.abandonExport` for #4951 (in PR #4957), but didn't realize we
could re-enable the clause.

closes #3378
warner added a commit that referenced this issue Jun 14, 2024
This clause was commented out because bug #3378 was still pending at
the time. I think we happened to fix that as a side-effect of adding
`syscall.abandonExport` for #4951 (in PR #4957), but didn't realize we
could re-enable the clause.

closes #3378
@mergify mergify bot closed this as completed in #8694 Jun 14, 2024
@mergify mergify bot closed this as completed in 9930bd3 Jun 14, 2024
mergify bot added a commit that referenced this issue Jun 14, 2024
This clause was commented out because bug #3378 was still pending at the time. I think we happened to fix that as a side-effect of adding `syscall.abandonExport` for #4951 (in PR #4957), but didn't realize we could re-enable the clause.

closes #3378
mhofman pushed a commit that referenced this issue Jun 20, 2024
This clause was commented out because bug #3378 was still pending at
the time. I think we happened to fix that as a side-effect of adding
`syscall.abandonExport` for #4951 (in PR #4957), but didn't realize we
could re-enable the clause.

closes #3378
mhofman pushed a commit that referenced this issue Jun 20, 2024
This clause was commented out because bug #3378 was still pending at the time. I think we happened to fix that as a side-effect of adding `syscall.abandonExport` for #4951 (in PR #4957), but didn't realize we could re-enable the clause.

closes #3378
mhofman pushed a commit that referenced this issue Jun 22, 2024
This clause was commented out because bug #3378 was still pending at
the time. I think we happened to fix that as a side-effect of adding
`syscall.abandonExport` for #4951 (in PR #4957), but didn't realize we
could re-enable the clause.

closes #3378
mhofman pushed a commit that referenced this issue Jun 22, 2024
This clause was commented out because bug #3378 was still pending at the time. I think we happened to fix that as a side-effect of adding `syscall.abandonExport` for #4951 (in PR #4957), but didn't realize we could re-enable the clause.

closes #3378
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants