-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
increasing end-of-crank time consumed in testnet4.5 vattp trace, promises not getting GCed? XS bug? #3839
Comments
I've confirmed that the increase in runtime still occurs in a stripped-down environment (no kernel, just vattp and synthesized I'm starting to suspect something in XS is failing to mark a Promise as unreferenced, since there's no obvious userspace strong collection of Promises, and a similar bug was fixed in XS several months ago. |
I'm pretty sure this is a problem in XS, and I've filed Moddable-OpenSource/moddable#701 with steps to reproduce for the Moddable folks to take a look at. |
* fix a major memory leak: 64 bytes per Map `delete()`, 32 per Set `delete()` * should: closes #3839 * unfortunately Map/Set deletion is now O(N) not O(1) * possibly fix #3877 "cannot read (corrupted?) snapshot" Note that this breaks snapshot compatibility, and probably metering compatibility. closes #3889
We upgrade the XS submodule to the latest version: Moddable-OpenSource/moddable@10cc52e This fixes a major memory leak: 64 bytes per Map `delete()`, 32 per Set `delete()`. We believe this should: closes #3839 Unfortunately Map/Set deletion is now O(N) not O(1). This version of XS also fixes a bug that might be the cause of #3877 "cannot read (corrupted?) snapshot", but we're not sure. Note that this breaks snapshot compatibility (snapshots created before this version cannot be loaded by code after this version). It might also break metering compatibility, but the chances seem low enough that we decided to leave the metering version alone. closes #3889
We upgrade the XS submodule to the latest version: Moddable-OpenSource/moddable@10cc52e This fixes a major memory leak: 64 bytes per Map `delete()`, 32 per Set `delete()`. We believe this should: closes #3839 Unfortunately Map/Set deletion is now O(N) not O(1). This version of XS also fixes a bug that might be the cause of #3877 "cannot read (corrupted?) snapshot", but we're not sure. Note that this breaks snapshot compatibility (snapshots created before this version cannot be loaded by code after this version). It might also break metering compatibility, but the chances seem low enough that we decided to leave the metering version alone. closes #3889
For reference, the most obvious problem turned out to be unrelated to Promises. XS had a memory leak in which deleting an entry from a Set cost 32 bytes, and 64 bytes for a Map. The ultimate cause was the tombstone entries in the insertion-order linked list used to ensure iterators behaved correctly in the face of deletion. The new version of XS fixes this, although at the cost of O(N) deletion operations. I ran a simulation of vattp behavior (delivering synthetic messages to an XS worker hosting the vattp code) and observed mostly flat runtime. I ran it for about 10% of the total testnet traffic (the original trace showed a clear linear increase in runtime over that range). There remained some occasional jank that grew linearly, maybe once every 5k deliveries, which I don't understand, but I'm willing to believe it was happening in the V8 side that was driving the simulation. I also ran a much smaller test (100 steps) and inspected XS's I ought to run a more extensive test in which I replay the entire testnet4.5 trace into a new-XS worker and examine the memory and time usage, but I'm more likely to wait until we do a deeper performance investigation of some other vat. |
I'm investigating performance data from the recent phase4.5 ("metering") testnet, specifically vattp (v14). The code in this vat is super-simple, and does not accumulate any significant state over time. Yet for some reason the time it took for each crank increased significantly over the week-long testnet period.
This is a graph of the total delivery time (slogfile
deliver
event untildeliver-result
), vs the delivery number, for just thedeliverInboundAck()
messages:The code for
deliverInboundAck()
is:It's just retrieving a high-water mark number from a
Map
for the named remote, and making oneremove
device invocation for each message number between that number and the incoming ack number, then updating the high-water mark. The observed metering-visible activity (computrons) is a nearly-deterministic function of the number of messages that get removed, over the entire testnet period, so at least the JS engine doesn't think the amount of work is increasing over time.When I graph the amount of time that elapses between the initial
deliver
event and the kernel-side end of the last syscall, I see this shape:which tells me:
I'm fairly sure that the extra time is being spent during the forced GC that liveslots does at end-of-crank (the same thing we're aiming to do less frequently with #3767 ). However, given that vattp is not doing anything to justify an increased heap size (no objects are being passed around), my hypothesis is that something inside liveslots is causing the heap to grow, and the only thing I can think of is a bunch of result promises.
There are three pathways through vattp that get called a significant number of times (i.e. more than once per provisioned remote). Two (
deliverInboundMessage
anddeliverInboundAck
) are triggered by the mailbox device. They are "sendOnly", and do not have result promises.deliverInboundMessage
emits a single eventual-send and a device invocation for each non-duplicate message:(as noted, we want to change that
i.receiver~.receive()
to use sendOnly once we have suport for it, because we don't really care about the result promise).deliverInboundAck
does a device invocation, but no eventual-sends.The third pathway is on the outbound side, when comms tells vattp to transmit a message to the remote:
This does not cause any eventual sends, and the comms message is sendOnly (because comms does not use liveslots).
So the only place that's obviously creating a result promise is the
~.receive()
call. Comms should be resolving that (toundefined
) a moment later, and liveslots is supposed to forget about resolved promises. But if something is broken in that pathway, maybependingPromises
is growing, causing heap growth, and subsequent GC slowdowns.My plan is to build a benchmark environment that creates a vattp vat (without a kernel), sets up a remote, then feeds a long stream of
deliverInboundMessages
/deliverInboundAck
/transmit
messages in, and watches how the execution time changes over time. If that demonstrates the same problem that we recorded on the testnet, the next step is to start instrumenting liveslots, specificallypendingPromises
, to see if it is growing over time.The text was updated successfully, but these errors were encountered: