Skip to content

[codex] unify cache capacity config#381

Draft
easel wants to merge 10 commits into
Luce-Org:mainfrom
easel:codex/unified-cache-config
Draft

[codex] unify cache capacity config#381
easel wants to merge 10 commits into
Luce-Org:mainfrom
easel:codex/unified-cache-config

Conversation

@easel

@easel easel commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Summary

This is the org-visible stacked PR for the unified cache config model. It depends on #380.

Because the #380 head branch currently lives in the easel fork and this account cannot push branches to Luce-Org/lucebox-hub, this PR is temporarily opened against main. That makes it visible to the team, but the GitHub compare includes #380 plus the unified-cache commit. Once #380 lands, this PR should be rebased/retargeted to main so the visible diff collapses to the unified-cache work only.

What changed

  • Replaces cache slot sizing with byte-sized RAM/disk budget flags for prefix and prefill caches.
  • Keeps legacy slot flags as compatibility aliases.
  • Adds disk-backed exact prefill cache support alongside existing prefix disk cache support.
  • Exposes unified cache budget/usage telemetry in /props.
  • Updates docs, OpenAPI props, entrypoint env vars, scripts, and cache proof tests.

Validation

  • cmake --build server/build --target test_server_unit dflash_server -j$(nproc)
  • server/build/test_server_unit (1959 assertions, 0 failures)
  • python3 -m py_compile for changed Python scripts
  • bash -n server/scripts/entrypoint.sh
  • git diff --check
  • DFLASH_SERVER_BIN=server/build/dflash_server python3 server/scripts/test_prefill_cache.py
    • RAM prefill cache active with 2 hits and ~10427x lower-bound warm speedup
  • DFLASH_SERVER_BIN=server/build/dflash_server python3 server/scripts/test_prefill_disk_cache.py
    • disk prefill cache active with 2 hits and ~10623x lower-bound warm speedup

Notes

The cleaner branch layout would be base Luce-Org:codex/prefill-cache-wiring and head Luce-Org:codex/unified-cache-config, but pushing those branches requires org write permission.

Review in cubic

@easel

easel commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator Author

Claude Code reviewed the stacked diff from #380 head to this branch. It found stale /props contract docs and an OpenAPI prefix_cache example mismatch, plus a note that legacy full_cache.enabled is RAM-only in disk-only prefill mode. Addressed those in 5a321eb (docs: align props cache contract). No runtime correctness findings were reported in that review.

@easel

easel commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up pushed in 7568ed2 to make the unified cache model the primary user surface. Defaults now use --cache-ram 1GiB split as 256MiB prefix + 768MiB exact prefill, and --cache-disk 16GiB split as 4GiB prefix + 12GiB exact prefill when a cache dir is configured. The server also alternates cold-miss snapshot targets when both RAM pools are viable so exact repeated prompts and multi-turn prefix reuse can both populate without user tuning.\n\nValidation after this follow-up:\n- cmake build: test_server_unit + dflash_server\n- server/build/test_server_unit: 1978 assertions, 0 failures\n- py_compile for touched Python scripts\n- bash -n server/scripts/entrypoint.sh\n- OpenAPI YAML parse/cache example assertion\n- git diff --check\n- RAM exact-prefill proof: 1 commit, 2 hits, warm prefill rounded to 0.000s\n- Disk exact-prefill proof: RAM off, 1 disk save, 2 disk hits\n\nClaude Code reviewed this follow-up diff and reported no actionable correctness bugs. It flagged only a cosmetic /props example indentation issue, fixed before this commit.

@easel easel force-pushed the codex/unified-cache-config branch from 4b30337 to 93b1522 Compare June 15, 2026 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant