Rationale
kbox_shadow_create() rebuilds a fresh memfd on every O_RDONLY open by copying
the entire file contents through LKL via a pread64 loop. For hot files opened
repeatedly (DSOs, executables, config files read by every process), this
redundant copy dominates the open+close path, currently 34-57us depending on
mode and architecture.
An inode+mtime keyed cache that reuses existing memfds would eliminate the copy
on repeated opens, roughly halving open+close latency for cached files.
Proposed Changes
- Cache structure: add a small hash table keyed by (LKL inode number,
mtime, file size) that maps to an existing memfd file descriptor. On
cache hit, dup the memfd instead of rebuilding it.
- Invalidation: on each open, perform a single LKL fstat to compare
mtime and size against the cached entry. Evict on mismatch.
- Eviction policy: LRU or simple capacity bound (e.g. 64 entries) to
prevent unbounded memfd accumulation. Closed memfds (refcount drops to
zero) are candidates for immediate eviction.
- Correctness invariant: cached memfds must be dup'd, not shared
directly, so each tracee FD has independent lifetime and the cache entry
survives individual close operations.
Considerations
- The supervisor is single-threaded today, so the cache needs no locking.
If the supervisor loop is ever parallelized, the cache must be protected.
- Sealed memfds (MFD_ALLOW_SEALING + F_SEAL_WRITE from Phase 8) are safe
to share via dup since the seal prevents mutation.
- Cache hit rate depends on workload. Dynamic linker loads (ld-musl, libc.so)
are the primary beneficiary since every exec re-opens the same files.
- Must not cache files larger than KBOX_SHADOW_MAX_SIZE (the existing 256MB
cap still applies).
References
src/shadow-fd.c: kbox_shadow_create(), pread64 copy loop
include/kbox/shadow-fd.h: KBOX_SHADOW_MAX_SIZE
src/seccomp-dispatch.c: O_RDONLY gating and shadow creation call site
src/fd-table.c: per-entry host_fd / lkl_fd tracking
Rationale
kbox_shadow_create()rebuilds a fresh memfd on every O_RDONLY open by copyingthe entire file contents through LKL via a pread64 loop. For hot files opened
repeatedly (DSOs, executables, config files read by every process), this
redundant copy dominates the open+close path, currently 34-57us depending on
mode and architecture.
An inode+mtime keyed cache that reuses existing memfds would eliminate the copy
on repeated opens, roughly halving open+close latency for cached files.
Proposed Changes
mtime, file size) that maps to an existing memfd file descriptor. On
cache hit, dup the memfd instead of rebuilding it.
mtime and size against the cached entry. Evict on mismatch.
prevent unbounded memfd accumulation. Closed memfds (refcount drops to
zero) are candidates for immediate eviction.
directly, so each tracee FD has independent lifetime and the cache entry
survives individual close operations.
Considerations
If the supervisor loop is ever parallelized, the cache must be protected.
to share via dup since the seal prevents mutation.
are the primary beneficiary since every exec re-opens the same files.
cap still applies).
References
src/shadow-fd.c:kbox_shadow_create(), pread64 copy loopinclude/kbox/shadow-fd.h:KBOX_SHADOW_MAX_SIZEsrc/seccomp-dispatch.c: O_RDONLY gating and shadow creation call sitesrc/fd-table.c: per-entry host_fd / lkl_fd tracking