Skip to content

fix(storage): reject path-traversal artifact ids at the path chokepoint#301

Open
minion1227 wants to merge 2 commits into
vouchdev:mainfrom
minion1227:minion_149
Open

fix(storage): reject path-traversal artifact ids at the path chokepoint#301
minion1227 wants to merge 2 commits into
vouchdev:mainfrom
minion1227:minion_149

Conversation

@minion1227

@minion1227 minion1227 commented Jul 1, 2026

Copy link
Copy Markdown

What changed

Artifact ids are now rejected at the storage layer when they contain a path
separator, .., a nul byte, or are empty. A new _reject_unsafe_id guard
runs at the _yaml / _page_path chokepoint that every _*_path helper
funnels through, so all artifact read/write paths are covered at once.

Why

Source.id is locked to a hex sha256, but Claim, Page, Entity,
Relation, Evidence, Session, and Proposal all declare a bare
id: str with no validator, and _yaml turned that straight into
kb_dir/<sub>/<id>.yaml with no containment check. A poisoned id could then
escape kb_dir:

  • bundle smuggling → later write outside the tree. import_apply uses
    _safe_member_path to keep each tar member's filename inside kb_dir,
    but does not check the id field inside the YAML body. A claim landed
    under claims/innocent.yaml whose id is "../../../tmp/pwned" reads
    back fine (safe filename), but the next update_claim / supersede /
    contradict resolves _claim_path(claim.id) to a path outside kb_dir.
  • arbitrary read on get. get_claim("../../../etc/hostname") resolved
    straight through _yaml before this change.

This is the same class of hole the tar-member fix (bundle._safe_member_path,
the CVE-2007-4559 fix) already closed on the import path — _reject_unsafe_id
mirrors its _unsafe_name_reason checks at the storage chokepoint. closes #149.

What might break

nothing legitimate. artifact ids are flat slugs / hex / uuids — none contain
/, \, .., or nul — so every real read/write is unaffected (the full
suite, including bundle/sync/session round-trips, is green). the only calls
that now raise ValueError are ones that were already unsafe. no on-disk
format, kb.* method, or audit-log shape changes.

a complementary model-layer id validator (mirroring
Source._id_is_hex_sha256) is deliberately left as a follow-up: the storage
guard already closes every reach path, including raw-id reads a model
validator cannot see, and keeping it storage-only avoids overlapping the
model classes.

VEP

not a surface change — a storage-layer containment guard in the same shape
already merged for tar members. no VEP needed.

Tests

  • make check — ruff clean; mypy clean on touched files; pytest green
    (only pre-existing, environment-only failures remain: fastapi/[web]
    not installed and embeddings installed, both of which also fail on
    main)
  • new tests in tests/test_storage.py: _yaml / _page_path reject
    traversal / separator / absolute / empty ids (parametrised); put_*,
    get_*, and put_page reject traversal ids; the disk-smuggled-id →
    update_claim exploit chain is blocked before any write; and a
    regression guard that legitimate slug / hex ids still resolve
  • CHANGELOG.md updated under ## [Unreleased]

Summary by CodeRabbit

  • Bug Fixes
    • Strengthened storage handling to reject unsafe artifact and source IDs, preventing path traversal and accidental writes or reads outside the app’s data directory.
    • IDs that are empty, contain path separators (/ or \), path traversal segments (./..), or NUL bytes are now blocked across claim and page operations, including updates.
    • Added regression coverage (including a “disk smuggling” scenario) to ensure malicious on-disk content can’t be promoted through updates, while valid IDs still resolve normally.

`Source.id` is locked to a hex sha256, but `Claim`, `Page`, `Entity`,
`Relation`, `Evidence`, `Session`, and `Proposal` all carry a bare
`id: str` with no validator. `_yaml` / `_page_path` turned those ids
straight into filesystem paths with no containment check, so a poisoned id
— e.g. `"../../tmp/x"` smuggled through a bundle body that `import_apply`
lands under a safe filename but whose in-memory id is traversal — could
escape `kb_dir` on the next put / update / lifecycle write, or read an
arbitrary file on get.

add `_reject_unsafe_id`, mirroring `bundle._unsafe_name_reason`: an id must
be non-empty and free of nul bytes, path separators, and `..` components.
route it through `_yaml` and `_page_path` — the single chokepoint every
`_*_path` helper funnels through — so all artifact read/write paths are
guarded at once.

a complementary model-layer id validator (mirroring `Source._id_is_hex_sha256`)
is left as a follow-up; the storage guard already closes every reach path,
including raw-id reads a model validator cannot see.

closes vouchdev#149
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 206f4b37-f58e-4863-9162-242fe03ec4a3

📥 Commits

Reviewing files that changed from the base of the PR and between 137b515 and a5db33c.

📒 Files selected for processing (2)
  • src/vouch/storage.py
  • tests/test_storage.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/vouch/storage.py
  • tests/test_storage.py

📝 Walkthrough

Walkthrough

Adds a storage-layer unsafe-ID validator, wires it into YAML, page, and source path construction, expands regression coverage for rejected and valid IDs, and adds a changelog note describing the path-traversal hardening.

Changes

Artifact ID Path Traversal Guard

Layer / File(s) Summary
Unsafe ID validator and path helper wiring
src/vouch/storage.py
Adds _reject_unsafe_id(obj_id) and calls it from _yaml(sub, obj_id), _page_path(page_id), and _source_dir(source_id) before filesystem paths are built.
Regression tests and changelog
tests/test_storage.py, CHANGELOG.md
Adds storage tests for unsafe-id rejection across source, YAML, page, claim, and update flows, includes a disk-smuggling regression and a safe-id path check, and documents the fix in the changelog.

Estimated code review effort: 2 (Simple) | ~12 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the storage fix for rejecting unsafe artifact ids.
Linked Issues check ✅ Passed The PR enforces storage-layer rejection for unsafe artifact ids across YAML, pages, and source paths, matching the issue's traversal-prevention goal.
Out of Scope Changes check ✅ Passed No unrelated changes are evident; the changelog note and source-path guard support the same storage-hardening work.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions github-actions Bot added docs documentation, specs, examples, and repo guidance storage kb storage, migrations, schemas, and proposals tests tests and fixtures size: S 50-199 changed non-doc lines labels Jul 1, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/vouch/storage.py`:
- Around line 257-265: The storage path validation is incomplete because source
paths are still built from untrusted source IDs without rejection. Update the
source lookup flow in Storage, especially get_source(), read_source_content(),
and _source_dir(), to apply _reject_unsafe_id() to source_id before constructing
paths so traversal inputs cannot escape kb_dir. Keep the fix aligned with the
existing _yaml(), _page_path(), and _claim_path() validation pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 868a3b3e-60b5-42f6-bee6-d17f55afff2f

📥 Commits

Reviewing files that changed from the base of the PR and between 94e84b1 and 137b515.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • src/vouch/storage.py
  • tests/test_storage.py

Comment thread src/vouch/storage.py
address review on vouchdev#301: `_yaml` / `_page_path` were guarded, but
`get_source` / `read_source_content` (and the evidence-ref existence
checks) route raw `source_id` strings through `_source_dir` without it, so
`get_source("../../outside")` could still escape kb_dir to
`<outside>/meta.yaml` / `content`. `Source.id` is hex-locked on the model
but these read paths take unvalidated strings.

apply `_reject_unsafe_id` at the `_source_dir` chokepoint too, closing the
last storage traversal path. add get_source / read_source_content
rejection tests.
@minion1227

Copy link
Copy Markdown
Author

good catch — fixed in a5db33c. added _reject_unsafe_id at the _source_dir
chokepoint too, so get_source / read_source_content and the evidence-ref
existence checks reject traversal ids as well. Source.id is hex-locked on
the model, but those read paths take raw strings, so this closes the last
storage traversal path. added get_source / read_source_content rejection
tests; full suite still green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs documentation, specs, examples, and repo guidance size: S 50-199 changed non-doc lines storage kb storage, migrations, schemas, and proposals tests tests and fixtures

Projects

None yet

Development

Successfully merging this pull request may close these issues.

validation gap: artifact ids accept path-traversal strings and reach filesystem paths unsanitised in storage operations

1 participant