-
Notifications
You must be signed in to change notification settings - Fork 585
Related work
This page is in a somewhat disorganized state, please bear with us.
No target recompilation or VM hypervisor required.
gdb reverse debugging ("process recorder")
cjones:
Process record and replay works by logging the execution of each machine instruction in the child process (the program being debugged), together with each corresponding change in machine state (the values of memory and registers).
- unclear how modification of user memory during syscalls is recorded (apparently not at all)
- unclear how process-shared memory is dealt with (apparently not at all)
- very very high overhead (singlesteps the program using ptrace)
- good approach for efficient replaying reverse-step et al.
- Similar design to rr: records whole Linux process
- Relies on code instrumentation in some manner
- Single-core execution
- Currently (4.0.3363) crashes when trying to record Firefox
- Integrates with gdb and some other similar debuggers
- Offers "Live Recorder" which you link into your program and lets you turn on recording in the field
RogueWave TotalView ReplayEngine
Sounds similar to rr/UndoDB but no mention of performance counters (in 2008 they probably didn't work anyway). Seems to use code instrumentation according to http://bgq1.epfl.ch/totalview/ReplayEngine_Getting_Started_Guide.pdf.
- Supports multithreading
- Based on library interception and a compiler plugin to instrument atomic operations
- Would require customization of JIT routines that emit atomic ops
- Supports multithreading
- Library interception; assumes all synchronizing operations go through library calls
- Only replays "in situ" since last "stop the world" epoch. Can't replay across an epoch boundary
- Assumes no races; if divergence detected, just naively tries to replay hoping this will get the right schedule
Deterministic Process Groups in dOS
Arnold Low-overhead multicore record and replay based on instrumenting pthreads APIs (and atomics?) and assuming there are no data races.
BEEER "BEEER: distributed record and replay for medical devices in hospital operating rooms". Extending Arnold to track inter-machine communication.
- Project canceled
Crosscut builds on the VMWare system and lets you "relog" to generate new logs during replay, including leveraging Chronicle to generate a Chronicle database!
Kendo: Efficient Deterministic Multithreading in Software Out of date with its observations on performance counter behavior, but first paper to use performance counters for async event timing AFAIK.
- Similar to Chronomancer for Java.
- Chronon instruments bytecode to record variable changes and memory writes. Raw trace data goes to helper threads which use carefully optimized compression.
- It's unclear, but there's an "unpacker" step that probably performs some kind of indexing.
- Overheads quoted in this slide deck range from >200x (even more than Chronomancer) for well-optimized Java code that's CPU bound, down to 2x when you spend plenty of time in I/O or code that's excluded from Chronon instrumentation. That's probably a reasonable thing to do for J2EE code, and they get to use multiple cores to run the application.
- There's a tradeoff between the scope of code recorded and the overhead of recording described here.
- Scalability issues mentioned here.
- Prediction-based compression described here
- For something like Firefox, where you really want to instrument the entire software stack and parallelism is not a big issue, rr's approach seems much better.
- No divergence support: of course Java VMs don't support cloning, so they could only implement divergence using emulation, but you'd need a lot of heap data to make that work reliably.
Qira Pretty naive implementation.
Tetrane Looks like a great implementation of omniscience. Focused on reverse-engineering applications so has a quite different feature set to Pernosco.
- REPT captures recent control flow via Intel PT and stores that in a crash dump, then reconstructs data values
- Integrates into WinDbg
- Sometimes produces incorrect results, which could be bad
- Obviously not as good as a proper recording if you can afford the overhead, but seems like a great addition for crash reporting
roc:
There are a few major differences between Scribe and rr:
- Scribe doesn't serialize all threads. Instead they do a bunch of work to make sure all threads can run simultaneously. This reduces overhead in some places and adds overhead in others.
- They say their approach doesn't require "changing, relinking or recompiling the kernel" but their approach has to track internal kernel state like inodes and VFS path traversal, and it's not really clear how they do that. They also say "Scribe records by intercepting all interactions of processes with their environment, capturing all nondeterminism in events that are stored in log queues inside the kernel" so my guess is they're using a kernel module. That's a pretty big negative in my view.
- Scribe doesn't use performance counters to record asynchronous events. Instead they defer signal delivery until the next time the process enters the kernel. If the process doesn't enter the kernel for a long time, they basically take a snapshot of the entire state, force the process into the kernel and restart recording --- extremely heavyweight. For some bugs, it's essential to allow async signal delivery at any program point, so I don't like Scribe's approach there.
Time-Traveling Virtual Machines
See this page.
ORDER: Object centRic DEterministic Replay for Java
PRES: Probabilistic replay with execution sketching on multiprocessors
Infrastructure-Free Logging and Replay of Concurrent Execution on Multiple Cores A bit like ODR; records some input syscalls while allowing threads to run concurrently, then detects divergence and searches for shared-memory races that allow for alternative schedules that would fix the divergence.
Dune cjones:
This isn't a record/replay tool per se, but rather creates a framework on which one could be built. The elevator pitch is approximately that Dune exposes hardware virtualization features to userspace. So userspace can manage its own page tables, directly process exceptions, and so forth. With those tools, one could build a userspace-only ptrace equivalent. And that, in theory, could allow building an rr-like tool without rr's libpreload hackery (syscallbuf and seccomp-bpf) but with comparable performance. There are further interesting things that could be done with custom page-table entries. Lingering issues
- does Dune expose rdtsc and cpuid virtualization?
- does Dune expose some kind of interrupting programmable hwtimer?
CRIU checkpointing of user-space Linux processes