feat(cli): emit per_node.db in recce init --cloud [PR 2/3]#1335
Merged
even-wei merged 2 commits intoApr 23, 2026
Merged
Conversation
PR 2/3 of DRC-3295. Wires PR 1's per-node SQLite emitter into the `recce init --cloud` command so Cloud can stream lineage rows without proxying to an ephemeral Recce instance. Key changes: - In cloud mode, write `cll_cache.db` to a `tempfile.mkdtemp(prefix= "recce-cll-")` scratch dir so build_full_cll_map keeps its warm-cache perf, but the cache.db never leaves the container (ECS task GC on exit). A user-provided --cache-db is intentionally ignored in cloud mode. - After build_full_cll_map(), emit per_node.db (manifest + catalog rows, not_null/unique tests, primary_key, edges) to the same scratch dir and upload it via the new `per_node_db_url` upload key. - Graceful degradation: if Cloud has not yet added per_node_db_url, log a warning and continue — old CLI + new Cloud and vice versa stay compatible. - Remove the cll_cache.db cloud upload block; cache.db is now local-only in cloud mode. - Clean up the scratch dir on successful upload; leave it on failure for debugging (ECS reclaims it anyway). Tests: - tests/test_cli_per_node_db.py: integration tests for per_node.db upload, local-mode non-regression, tempdir cleanup, mode-switching safety, and a contract test vs. DbtAdapter.get_model() using the jaffle_shop fixture (manifest + catalog round-trip → reconstructed get_model() payload matches live adapter output). - Update test_cli_cache.py::test_init_cloud_upload_partial_failure to use per_node_db_url instead of the removed cll_cache_url upload path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
Contributor
Author
|
Heads-up — pushing a revert commit shortly. I had this PR suppress the Incoming fix:
Fix commit inbound. Please hold review until it lands — or review just the |
Original PR 2 removed the cll_cache.db cloud upload path under the mistaken assumption that cache.db was unwanted in cloud mode. cache.db is load-bearing for cross-session warm-cache reuse (OSS local users and cloud sessions alike — downloaded at init, uploaded at completion, reused by build_full_cll_map). Restore the cll_cache_upload_url path and return cache_db to its pre-PR-2 default path. per_node.db remains tempdir-scoped (genuinely throwaway per-task artifact) and continues to upload via per_node_db_url. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: even-wei <evenwei@infuseai.io>
eff86d6
into
feature/drc-3295-pr1-per-node-db-emitter
1 check passed
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR checklist
What type of PR is this?
feat— wires the emitter intorecce init --cloudand adds the cache.db handling.What this PR does / why we need it:
Stacked on top of #1334 (PR 1/3 — the pure emitter module). Please review #1334 first; this PR shows the merge diff against that branch.
Wires
PerNodeDbWriterfrom #1334 intorecce init --cloud:build_full_cll_map()succeeds, emitsper_node.dbfor bothbaseandcurrentenvs from the loaded manifests + catalogsper_node.dbvia a newper_node_db_urlpresigned-URL key (graceful-degraded: missing key logs a warning, doesn't fail the command)cll_cache.dbnow lives intempfile.mkdtemp(prefix="recce-cll-")instead of~/.recce/cll_cache.db. Purpose: keep the warm-cache perf (build_full_cll_mapre-uses previous CLL slices) while guaranteeing the cache DB never ends up on S3 or on the ECS task host after shutdowncll_cache_upload_urlupload path entirely (cache.db is not a cloud artifact)shutil.rmtreecleanup of the tempdir on success viatry/finallyLocal (non-cloud)
recce initbehavior is unchanged: still writes~/.recce/cll_cache.db(or whatever--cache-dbpoints to), still doesn't emitper_node.db.Which issue(s) this PR fixes:
Part of DRC-3295 (epic DRC-3294, project "Lineage API Data Transfer Optimization" Phase III).
Special notes for your reviewer:
per_node_db_urlto the response ofget_upload_urls_by_session_id(same presigned-URL pattern ascll_map_url, content typeapplication/octet-stream). CLI handles missing key gracefully, so this PR can land independently of that work.tests/test_cli_per_node_db.py:DbtAdapter.get_model()on thetests/manifest.json+catalog.jsonjaffle_shop fixturetests/test_cli_cache.pyretargeted (test_init_cloud_upload_partial_failure_shows_warning) since thecll_cache_urlupload branch is now deletedtest_spa_route_*requires a built frontend that's not in the worktree)cache_dbpath;try/finallycleanup semantics; contract-test coverageDoes this PR introduce a user-facing change?:
Cloud users (implicit): running
recce init --cloudnow uploads an extraper_node.dbartifact and no longer uploadscll_cache.db. Consumers in Recce Cloud will begin readingper_node.dbonce the Cloud-side URL key is added. Nothing breaks in the meantime — thecll_cache.dbupload path was already only read via thecll_cache_urlkey, which Cloud will continue to provide transparently during migration (CLI just ignores it).Local users (no change): local
recce initbehavior is unchanged.NONE (user-visible), but the cloud artifact set does change — covered by the cross-repo coordination note above.