Description
At the Node Diagnostics WG there was discussion around what is needed to make time-travel debugging more generally available. Key work items identified to make this happen include:
- Work to refactor certain coding idioms into more record/replay friendly forms. One example includes the AliasedBuffer class.
- Discussion around user experience and surfacing of this functionality. We have experimented with several aspects:
- Adding a new domain, TimeTravel, to the debugger API.
- Using this to added time-travel support to the builtin inspect debugger as well as VSCode.
- Adding a new module, trace_mgr.js, to enable simple generation of traces at common error points (unhandled exceptions, abnormal exits, console.error writes, etc.).
- Getting additional feedback on the value and impact of these use cases would be very useful in prioritizing and adjusting the implementation.
- Structuring code to make the implementation of the needed record/replay/snapshot code as VM neutral and easy as possible. More below.
Providing TimeTravel as a functionality in Node uniformly instead of as a vendor specific feature is a challenge.
One option is for each vendor to implement it primarily in the VM as is currently done in ChakraCore now. This approach has the upside of requiring minimal changes in Node but is likely to be problematic as it requires extensive duplication of almost identical functionality in each VM providing the support and, in the case of V8 since Node currently calls directly into their raw API's, the addition record/replay support presents potentially large maintainability and performance concerns.
The second approach, as discussed during the WG, involves using N-API to move from direct calls to V8 API's. This has the immediate benefit of a single location to put a large body of common record/replay code currently implemented in Chakra's JsRT host embedding API (for example here, which drives the logger in ChakraCore) with a neutral shared implementation based around the N-API specification. As N-API is at a higher level that JsRT this will also have the benefit of decreasing the overhead of running in record mode. However, there are several items that need investigation/work for this approach:
- The record/replay code must track identity for opaque references passed to/from various calls. ChakraCore has a non-moving collector so we can trivially just take pointer ids. Other VM's may use moving collectors which require more sophisticated approaches, e.g., adding an explicit tag field to a napi_value or updating tag info whenever the GC moves it.
- A N-API style model needs to be adopted in core so that there are no longer any direct calls through V8 API's.
- A smaller API needs to be defined which the VM can implement to support any features which cannot be included in the general Node layer, e.g., record Data.Time calls in the VM, take/restore a snapshot, etc.