Export wiki under-the-hood docs#2946
Open
avagin wants to merge 60 commits into
Open
Conversation
Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@google.com>
- Replace FIXME with a detailed description of the current approach - Explain architecture detection using PTRACE_GETREGSET - Describe the restoration process via sigreturn and mode switching - Update vsyscall handling details - Clarify the status of x32 support and TIF_IA32 removal Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain how CRIU restores AIO context IDs and ring buffers - Describe the tail synchronization technique using dummy /dev/null requests - Clarify the lack of support for in-flight events and its implications Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain metadata collection from /proc and BPF syscall - Describe data serialization using batch operations - Add details about frozen maps handling - Clarify current limitations regarding map_extra and BTF Signed-off-by: Andrei Vagin <avagin@google.com>
- Document full CGroup v2 support and properties - Explain CGroup namespace (CLONE_NEWCGROUP) handling - Clarify the 'soft mode' default and other restoration strategies - Detail the root mount requirement for bind-mounted subgroups Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the core problem of TCP 4-tuple mismatch - Describe solutions for listening, in-flight, and established sockets - Document the UPDATE_INETSK plugin hook for programmatic IP remapping - Add a summary table of options and flags Signed-off-by: Andrei Vagin <avagin@google.com>
- Clarify freezing mechanisms (PTRACE_INTERRUPT, Freezer CGroup) - Detail the parasite injection and bootstrap process - Explain the role of the restorer blob as a PIE and its conflict avoidance - Document the final transition via sigreturn Signed-off-by: Andrei Vagin <avagin@google.com>
- Document the use of 'compel hgen' for header generation - Update the example header format to include structured relocations - Describe the 'parasite_blob_desc' setup functions - Refine the build procedure steps Signed-off-by: Andrei Vagin <avagin@google.com>
- Embed DMTCP description and characteristics - Update CRIU supported architectures (s390, MIPS, RISC-V, etc.) - Refine the comparison table for accuracy and modern features - Add more context for BLCR, PinPlay, and Legacy OpenVZ Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the identification of COW candidates by comparing parent/child VMAs - Describe the pre-mapping strategy before fork to leverage kernel sharing - Detail the content verification and manual COW triggering - Document the use of madvise(MADV_DONTNEED) for final memory layout accuracy - Clarify current limitations regarding reparenting and VMA movement Signed-off-by: Andrei Vagin <avagin@google.com>
- Formalize the architectural comparison (userspace vs. kernel integration) - Highlight the dangers of DMTCP's fake PID virtualization - Explain CRIU's usage of ns_last_pid and clone3 for real PID restoration - Improve overall technical clarity and structure Signed-off-by: Andrei Vagin <avagin@google.com>
- Detail the Linux file object hierarchy (Inode, Dentry, File) - Explain the SCM_RIGHTS mechanism for retrieving local FD copies - Describe the gen_id and kcmp optimization for shared file detection - Clarify the two-tier image storage structure (fdinfo vs specialized images) Signed-off-by: Andrei Vagin <avagin@google.com>
- Clarify dirty page dumping in read-only mappings - Add instructions for using 'criu check --extra' - Detail PID mismatch solutions and internal interfaces (clone3) - Expand on external Unix socket limitations - Update guidance for Docker and container filesystem consistency Signed-off-by: Andrei Vagin <avagin@google.com>
- Formalize the Master and Slave descriptor concepts - Describe the 'open()' state machine and early FD distribution via SCM_RIGHTS - Document the inter-process synchronization (set_fds_event, futexes) - List key dependencies (TTYs, Unix Sockets, Epoll) - Add notes on Service FDs and restoration ordering Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain BTRFS virtual vs physical device ID resolution - Detail NFS 'Silly Rename' handling for unlinked files - Document OverlayFS path inconsistencies and linkat() fallback logic - Clarify legacy AUFS branch path fixes Signed-off-by: Andrei Vagin <avagin@google.com>
- Formalize TASK_ALIVE, TASK_STOPPED, and TASK_DEAD states - Explain the rationale for default behaviors in dump/restore - Mention pre-dump enforcement of the Running state - Document the use of --leave-stopped for debugging - Add instructions for resuming trees via SIGCONT and pstree_cont.py Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the PTRACE_SEIZE and PTRACE_INTERRUPT sequence - Detail the transparency of ptrace-stop (TRAP_STOP) - Document cgroup v1 and v2 freezer mechanisms - Mention kernel kludges for v1 freezer unreliability - Clarify the relationship between freezer and ptrace Signed-off-by: Andrei Vagin <avagin@google.com>
- Detail the challenges of finding the 'watchee' path - Explain the use of open_by_handle_at() and Irmap - Explicitly document that pending events are dropped with a warning - Explain how spurious events are generated during restore (ghost files) - Add details for Fanotify inode and mount marks Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain non-blocking techniques for FIFOs - Detail link-remap and ghost file strategies for unlinked files - Document mount namespace (mnt_id) and open_ns_root usage - Explain fown restoration (F_SETOWN_EX, UID switching, F_SETSIG) - Clarify flag sanitization and O_PATH handling Signed-off-by: Andrei Vagin <avagin@google.com>
- Formalize Master and Slave descriptor roles - Explain the SCM_RIGHTS distribution mechanism - Document transport socket naming and 'criu_run_id' usage - Detail deterministic master selection to avoid deadlocks - Explain dynamic service FD relocation during collisions Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain path loss in fsnotify instances - Describe the open_by_handle_at() mechanism and kernel integration - Detail the Irmap brute-force scanning strategy - Mention filesystem-specific behaviors (Tmpfs, OverlayFS) Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain path loss scenarios (unlinked, virtual files, mount shadowing) - Detail the Ghost File strategy (link count 0) and optimization (fiemap) - Document the Link-Remap strategy (link count > 0) via linkat() - Explain the PID helper (TASK_HELPER) mechanism for virtual files - Clarify handling for NFS Silly Rename and OverlayFS Signed-off-by: Andrei Vagin <avagin@google.com>
- Describe the (inode, device) to path resolution problem - List default heuristic scan hints (/etc, /var/log, etc.) - Explain user-defined scan paths via --irmap-scan-path - Detail the pre-dump optimization and irmap-cache.img - Clarify the status of Irmap vs open_by_handle_at on modern kernels Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the kernel pointer comparison mechanism of kcmp() - Describe the two-level red-black tree optimization (genid + kcmp sub-tree) - List all supported KCMP_* types (FILE, VM, FILES, FS, EPOLL_TFD, etc.) - Clarify how genid minimizes expensive system calls Signed-off-by: Andrei Vagin <avagin@google.com>
- Clarify feature detection for system calls, filesystems, and namespaces - Update persistent caching locations (/run/criu.kdat vs XDG_RUNTIME_DIR) - Distinguish between kerndat (host capabilities) and inventory (checkpoint metadata) - Mention 'criu check --extra' for runtime inspection Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain attribute extraction during checkpointing (Mode, Flags, Parent) - Detail index preservation using IFLA_NEW_IFINDEX - Document the --external macvlan[IFNAME]:OUTNAME option - Improve overall structure and clarity Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the soft-dirty bit mechanism for tracking modified pages - Document the usage of ioctl(PAGEMAP_SCAN) for efficient scanning (kernel v6.7+) - Describe the iterative pre-dump workflow and image chaining - Detail the consolidation of pages during restoration - Mention the role of the page server in minimizing disk I/O Signed-off-by: Andrei Vagin <avagin@google.com>
- Detail the multi-stage dumping approach involving parasite injection - Explain zero-copy dumping using vmsplice() and SPLICE_F_GIFT - Describe the use of splice() for efficient image writing and page server transport - Document VMA re-mapping and content filling during restoration - Add references to COW preservation and lazy migration (userfaultfd) Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the in_parent flag in pagemap entries - Detail detection of unchanged pages via soft-dirty bit - Document the --auto-dedup mode for dump and restore - Describe online disk space reclamation using FALLOC_FL_PUNCH_HOLE - Clarify image chaining and sparse file support Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the legacy ns_last_pid interface and its limitations - Detail the modern clone3() with set_tid mechanism (kernel v5.5+) - Describe the benefits of atomic PID assignment and nested namespace support - Mention automatic feature detection via Kerndat - Document implementation using architecture-specific assembly wrappers Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the PID reuse problem during iterative migration - Document the use of pidfd_open() for race-free identification - Detail the 'socket trick' for persistent FD storage via SCM_RIGHTS - Explain the identity verification process in subsequent iterations - List required kernel features (pidfd_open, pidfd_getfd) Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain stable identification vs numeric PIDs - Detail restoration of alive vs dead processes - Document the 'helper process' trick for dead pidfds - Explain the transition from anonymous inodes to pidfs (kernel v6.9+) - Clarify current limitations (PIDFD_THREAD) Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the sensitivity of rseq state to process execution - Document the use of PTRACE_GET_RSEQ_CONF and external peeking - Detail the critical requirement to unregister the restorer's own rseq - Explain how re-registration and rseq_cs restoration ensure automatic kernel fixups - Update kernel requirements (v5.13 for automated detection) Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the necessity of a dedicated context for memory swapping - Describe the shared restorer mapping and mremap-based re-positioning - Detail the safe hole detection strategy to avoid VMA conflicts - Document the final transition via sigreturn - Highlight the characteristics of the freestanding PIE blob Signed-off-by: Andrei Vagin <avagin@google.com>
- Detail the top-down allocation strategy using RLIMIT_NOFILE - Explain per-process isolation (service_fd_id) for shared FD tables - Document the relocation mechanism (F_DUPFD_CLOEXEC, dup3) - Describe the 'sfds_protected' flag and safety invariants - List common Service FD types (LOG, IMG, RPC, TRANSPORT, etc.) Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the use of internal inode numbers (shmid) for anonymous sharing - Detail the restoration of shared anonymous regions via memfd_create() - Describe the 'master' vs 'slave' roles and futex synchronization - Document System V IPC and file-backed shared mapping restoration - Add references to kcmp and memory dumping optimizations Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the use of sock_diag for kernel state extraction - Describe the SCM_RIGHTS mechanism for queue inspection - Detail TCP Repair Mode for connection restoration - List supported families including Netlink and Packet sockets - Improve overall structure and technical depth Signed-off-by: Andrei Vagin <avagin@google.com>
- Formalize the CR_STATE_* state machine and synchronization mechanism - Detail the multi-stage restoration workflow (Root Task, NS Prep, Forking, etc.) - Explain the security rationale for Stage 6 (Credentials and Seccomp) - Document the final transition via sigreturn and thread restoration Signed-off-by: Andrei Vagin <avagin@google.com>
- Detail the mechanics of TCP Repair Mode and state manipulation - Explain the role of libsoccr in capturing sequence numbers and options - Document the network locking workflow using nftables/iptables - Describe the 'Silent Close' technique to preserve peer connections - Highlight the importance of sequence number and window restoration Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the PTY index restoration 'brute-force' strategy - Detail the capture of termios, winsize, and ownership - Describe the restoration workflow for master and slave peers - Clarify the status of buffered data and legacy BSD PTYs - Document the re-binding of controlling terminals (TIOCSCTTY) Signed-off-by: Andrei Vagin <avagin@google.com>
- Detail the capture of device attributes (TUN vs TAP, Flags) - Explain index preservation using TUNSETIFINDEX - Document multi-queue support and re-attachment via TUNSETQUEUE - Clarify current limitations (BPF filters, in-flight packets) - Explain persistency management during restoration Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the mechanics of Lazy Migration and on-demand page loading - Detail the Lazy Pages Daemon and the UFFD descriptor handover (SCM_RIGHTS) - Document the use of non-cooperative UFFD features (Fork, Remap, Unmap) - Describe the page fault handling loop and page server integration - Clarify benefits and trade-offs of the lazy approach Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain Build-ID extraction (ELF magic, 1MB mapping) - Document 'buildid' (default) vs 'filesize' methods - Explain the automatic fallback mechanism - Describe the importance for security and memory pointer integrity - Detail usage via the --file-validation flag Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the address and ABI mismatch challenges - Detail the Proxy (Patching) method for older kernels - Document the modern arch_prctl method for native vDSO mapping - Explain the role and restoration of the VVAR data region - Mention automatic feature detection via Kerndat Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the decouplling of socket paths and inodes - Document the SIOCUNIXFILE ioctl for stable handle retrieval - Describe the restoration workflow (tmpfs yard, peer coordination) - Explain the capture and redelivery of in-flight file descriptors - Clarify handling of external Unix sockets Signed-off-by: Andrei Vagin <avagin@google.com>
- Document modern kernel features (clone3, PAGEMAP_SCAN, Mount V2) - Detail advanced introspection tools (sock_diag, /proc/pid/map_files) - Explain userspace components (Compel, Protobuf) - Add references to other architectural documents Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain how zombies are identified and their exit codes captured - Describe the 'helper technique' for restoring zombies via immediate exit - Detail parent-child coordination to prevent premature reaping - Add references to related technical documentation Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the hardware-assisted shadow stack mechanism - Document the state capture via NT_ARM_GCS ptrace regset - Detail restoration using map_shadow_stack and sigframe integration - List kernel requirements for AArch64 hosts Signed-off-by: Andrei Vagin <avagin@google.com>
- Explain the profile identification and namespace dumping process - Document the use of the 'parasite profile' for non-disruptive dumping - Detail policy loading via apparmor_parser and namespace reconstruction - Support for modern features like Profile Stacking - List kernel and filesystem requirements Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
rst0git
reviewed
Mar 9, 2026
rst0git
reviewed
Mar 9, 2026
rst0git
reviewed
Mar 9, 2026
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## criu-dev #2946 +/- ##
============================================
+ Coverage 57.19% 57.21% +0.01%
============================================
Files 154 154
Lines 40399 40400 +1
Branches 8857 8856 -1
============================================
+ Hits 23107 23113 +6
+ Misses 17032 17023 -9
- Partials 260 264 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.