Skip to content

perf: SafePath::validate canonicalization is 12x over target (12 us vs 1 us) #51

@bug-ops

Description

@bug-ops

Problem

SafePath::validate calls canonicalize() twice per entry during extraction, consuming 17-18% of total extraction CPU time. Measured at 12 us per call against a target of < 1 us.

Profiling data

CPU (flamegraph/samply):

Function % CPU
SafePath::validate 17-18%
canonicalize -> realpath 16-17%
realpath$DARWIN_EXTSN (kernel) 9-12%

Benchmark (criterion):

Validation type Time Target
Path validation (simple) 12.1 us < 1 us
Path validation (deep) 12.5 us < 1 us
EntryValidator (per entry) 12.0 us < 1 us
Symlink validation 1.13 us < 5 us

Root cause

Two canonicalize() calls in safe_path.rs:

  1. parent.canonicalize() (line 203) — checks parent dir resolves within dest
  2. resolved.canonicalize() (line 226) — checks full path resolves within dest

Each call resolves to realpath(3) with multiple getattrlist kernel syscalls per path component.

Proposed optimization

Skip canonicalize for known-safe parents during extraction:

  • If the parent directory was created by the extraction engine (tracked via DirCache), it cannot be a symlink pointing outside — skip parent.canonicalize()
  • If the archive contains zero symlink entries, canonicalization serves no security purpose — skip both calls
  • Cache the canonical dest path once and use string prefix comparison

Security invariant preserved: canonicalize is only needed to defend against symlink-in-parent attacks. If all parent directories were created by exarch during extraction, the attack vector does not exist.

Expected improvement

  • Validation cost: 12 us -> 2-4 us (3-6x faster)
  • Overall extraction: ~10-12% throughput improvement for many-files workloads

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions