-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Problem
SafePath::validate calls canonicalize() twice per entry during extraction, consuming 17-18% of total extraction CPU time. Measured at 12 us per call against a target of < 1 us.
Profiling data
CPU (flamegraph/samply):
| Function | % CPU |
|---|---|
SafePath::validate |
17-18% |
canonicalize -> realpath |
16-17% |
realpath$DARWIN_EXTSN (kernel) |
9-12% |
Benchmark (criterion):
| Validation type | Time | Target |
|---|---|---|
| Path validation (simple) | 12.1 us | < 1 us |
| Path validation (deep) | 12.5 us | < 1 us |
| EntryValidator (per entry) | 12.0 us | < 1 us |
| Symlink validation | 1.13 us | < 5 us |
Root cause
Two canonicalize() calls in safe_path.rs:
parent.canonicalize()(line 203) — checks parent dir resolves within destresolved.canonicalize()(line 226) — checks full path resolves within dest
Each call resolves to realpath(3) with multiple getattrlist kernel syscalls per path component.
Proposed optimization
Skip canonicalize for known-safe parents during extraction:
- If the parent directory was created by the extraction engine (tracked via
DirCache), it cannot be a symlink pointing outside — skipparent.canonicalize() - If the archive contains zero symlink entries, canonicalization serves no security purpose — skip both calls
- Cache the canonical dest path once and use string prefix comparison
Security invariant preserved: canonicalize is only needed to defend against symlink-in-parent attacks. If all parent directories were created by exarch during extraction, the attack vector does not exist.
Expected improvement
- Validation cost: 12 us -> 2-4 us (3-6x faster)
- Overall extraction: ~10-12% throughput improvement for many-files workloads
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request