Skip to content
rocallahan edited this page Apr 11, 2016 · 52 revisions

This page is in a somewhat disorganized state, please bear with us.

Table of Contents

Userspace recording

No target recompilation or VM hypervisor required.

Chronomancer/Chronicle

gdb reverse debugging ("process recorder")

cjones:

Process record and replay works by logging the execution of each machine instruction in the child process (the program being debugged), together with each corresponding change in machine state (the values of memory and registers).
  • unclear how modification of user memory during syscalls is recorded (apparently not at all)
  • unclear how process-shared memory is dealt with (apparently not at all)
  • very very high overhead (singlesteps the program using ptrace)
  • good approach for efficient replaying reverse-step et al.

UndoDB

  • Similar design to rr: records whole Linux process
  • Relies on code instrumentation in some manner
  • Single-core execution
  • Currently (4.0.3363) crashes when trying to record Firefox
  • Integrates with gdb and some other similar debuggers
  • Offers "Live Recorder" which you link into your program and lets you turn on recording in the field

RogueWave ReplayEngine

Sounds similar to rr/UndoDB but no mention of performance counters (in 2008 they probably didn't work anyway), so it's unclear whether/how replay of asynchronous events works.

Nirvana

Hypervisor recording

ReVirt

VMWare Record & Replay

  • Project canceled

PANDA

Xen-TT

QEMU

Simics

Performance Counters

Non-Determinism and Overcount on Modern Hardware Performance Counter Implementations (Weaver, Terpstra, Moore)

Language/VM-specific Replay

WebReplay

Chakra JS Debugger

Python Time Travel Debugger

Chronon

  • Similar to Chronomancer for Java.
  • Chronon instruments bytecode to record variable changes and memory writes. Raw trace data goes to helper threads which use carefully optimized compression.
  • It's unclear, but there's an "unpacker" step that probably performs some kind of indexing.
  • Overheads quoted in this slide deck range from >200x (even more than Chronomancer) for well-optimized Java code that's CPU bound, down to 2x when you spend plenty of time in I/O or code that's excluded from Chronon instrumentation. That's probably a reasonable thing to do for J2EE code, and they get to use multiple cores to run the application.
  • There's a tradeoff between the scope of code recorded and the overhead of recording described here.
  • Scalability issues mentioned here.
  • Prediction-based compression described here
  • For something like Firefox, where you really want to instrument the entire software stack and parallelism is not a big issue, rr's approach seems much better.
  • No divergence support: of course Java VMs don't support cloning, so they could only implement divergence using emulation, but you'd need a lot of heap data to make that work reliably.

GUI-level Record And Replay

Valera

Reran

(Not yet categorized)

Scribe

roc:

There are a few major differences between Scribe and rr:
  • Scribe doesn't serialize all threads. Instead they do a bunch of work to make sure all threads can run simultaneously. This reduces overhead in some places and adds overhead in others.
  • They say their approach doesn't require "changing, relinking or recompiling the kernel" but their approach has to track internal kernel state like inodes and VFS path traversal, and it's not really clear how they do that. They also say "Scribe records by intercepting all interactions of processes with their environment, capturing all nondeterminism in events that are stored in log queues inside the kernel" so my guess is they're using a kernel module. That's a pretty big negative in my view.
  • Scribe doesn't use performance counters to record asynchronous events. Instead they defer signal delivery until the next time the process enters the kernel. If the process doesn't enter the kernel for a long time, they basically take a snapshot of the entire state, force the process into the kernel and restart recording --- extremely heavyweight. For some bugs, it's essential to allow async signal delivery at any program point, so I don't like Scribe's approach there.

iDNA

Jockey

Pinplay

Respec

Echo

OS Support

BackTracker

Time-Traveling Virtual Machines

ExtraVirt

SubVirt

SMP-ReVirt

Speck

DoublePlay

See this page.

ReTrace

CLAP

Capo

QuickRec / Capo3

FlashBack

ORDER: Object centRic DEterministic Replay for Java

PRES: Probabilistic replay with execution sketching on multiprocessors

Arnold

Dune cjones:

This isn't a record/replay tool per se, but rather creates a framework on which one could be built. The elevator pitch is approximately that Dune exposes hardware virtualization features to userspace. So userspace can manage its own page tables, directly process exceptions, and so forth. With those tools, one could build a userspace-only ptrace equivalent. And that, in theory, could allow building an rr-like tool without rr's libpreload hackery (syscallbuf and seccomp-bpf) but with comparable performance. There are further interesting things that could be done with custom page-table entries. Lingering issues
  • does Dune expose rdtsc and cpuid virtualization?
  • does Dune expose some kind of interrupting programmable hwtimer?

Checkpointing

CRIU checkpointing of user-space Linux processes

Tonic Docker-based checkpointing for JS REPLs

seccomp-bpf

Mbox

Clone this wiki locally