Skip to content

test(e2e): add cli-detach-recovery, exec-attach, volume-readonly cases#710

Merged
DorianZheng merged 3 commits into
boxlite-ai:mainfrom
G4614:test/e2e-coverage-on-main
Jun 10, 2026
Merged

test(e2e): add cli-detach-recovery, exec-attach, volume-readonly cases#710
DorianZheng merged 3 commits into
boxlite-ai:mainfrom
G4614:test/e2e-coverage-on-main

Conversation

@G4614

@G4614 G4614 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

add 5 e2e cases pinning REST contracts not covered today

Coverage gaps

Three classes of behaviour run through the SDK → API → Runner → VM
chain but had no e2e pin on the chain itself; bugs in any one of them
would silently regress through make test:integration:* (which only
exercises local-FFI):

1. CLI detach lifecycle — `boxlite run -d` returns, CLI process exits,
   does a fresh CLI process still see / exec the box?
   FFI side  src/boxlite/tests/detach.rs, recovery.rs  ✓
   E2E side  scripts/test/e2e/cases/                    ✗
2. Execution.attach / reattach contract — bogus exec_id should be a
   typed client error; reattach to a completed exec should return a
   usable handle.
   Runner side  apps/runner/.../boxlite_exec_attach_test.go  ✓
   SDK <-> API  end-to-end                                    ✗
3. host bind mount via REST — the cloud runtime intentionally dropped
   host bind mounts (#639); REST callers passing host paths must
   silently no-op, no /mnt/<x> in the guest.
   FFI surface  src/boxlite/tests/mount_security.rs  ✓
   REST contract                                      ✗

Cases shipped

scripts/test/e2e/cases/test_cli_detach_recovery.py   (2 cases)
scripts/test/e2e/cases/test_exec_attach.py           (2 cases)
scripts/test/e2e/cases/test_volume_readonly.py       (1 case)

5 cases total. Author also dropped 6 cases from this branch's earlier
incarnation that were already committed alongside their respective
fix PRs (#686 / #688 / #689 / #691 / #692 / #696), so the diff is
strictly net-new coverage.

Test plan — run against current main

Stack: local e2e, runner unchanged, no source edits.

Case Result Notes
test_detached_box_exec_propagates_exit_code_on_fresh_cli ✅ PASS exit-code passthrough across CLI processes
test_detached_box_survives_cli_exit_and_is_reusable ⚠️ XFAIL (strict) reaches step 3 (boxlite exec <id> echo still-alive) then hits the stdout-drop race that #563 fixes — marker drops when #563 lands
test_attach_with_bogus_id_is_typed_error ✅ PASS bogus exec_id → typed Exception (not 5xx, not silent)
test_reattach_after_original_completes ⚠️ XFAIL (strict) same stdout-drop race (#563) on the original exec's out=='first-output' assertion
test_host_bind_mount_via_rest_is_silently_ignored ✅ PASS the box created with volumes=[(host_dir, "/mnt/ro", True)] reports MOUNT_LINE=<none> from /proc/mounts and the host marker file is untouched — REST silently dropped the host path

The 2 XFAILs are tied to #563 (fix(go-sdk): fold stream drain into Execution.Wait). Once #563 merges, both xfails flip xpass-strict, the markers come off, and the suite is 5/5 green. No additional fix work is needed in this PR.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Added end-to-end tests verifying CLI detach survival and box reusability across fresh CLI processes, including exit code propagation tests.
    • Added end-to-end tests for SDK reattach functionality, validating session state after execution completion.
    • Added end-to-end test confirming proper host bind mount handling during REST execution.

G4614 added 2 commits June 9, 2026 07:05
…ayout

Adds e2e regression tests for the surfaces that the new fix PRs
(boxlite-ai#686boxlite-ai#696) close, all under `scripts/test/e2e/cases/` so they slot
into the current main layout (independent of the layout reorg in
PR boxlite-ai#682).

New cases:

  - test_files_io.py            copy_in/copy_out text+binary roundtrip,
                                include_parent + overwrite=False
  - test_snapshot_clone.py      snapshot.create/list/restore + clone +
                                export → import roundtrip
  - test_images_pull_list.py    runtime.images.list (and pull-unreachable
                                typed-error contract)
  - test_exec_user.py           box.exec(user=) numeric uid, uid:gid, and
                                invalid-user 4xx mapping
  - test_volume_readonly.py     volumes=[(host, guest, RO)] mount + write
                                rejection inside the guest
  - test_exec_attach.py         attach to finished exec returns typed err
                                / reattach hands back a usable handle
  - test_box_metrics.py         box.metrics + runtime.metrics shape and
                                counter increments
  - test_network_allow_net.py   NetworkSpec(mode/allow_net) enforces the
                                gvproxy filter
  - test_cli_detach_recovery.py boxlite run -d → fresh CLI process can
                                ls / inspect / exec the same box id

Each case targets a specific production seam exercised by one of the
fix PRs; PR descriptions pin the matching test path.
test_exec_user.py → boxlite-ai#686, test_network_allow_net.py → boxlite-ai#687,
test_files_io.py → boxlite-ai#688, test_box_metrics.py → boxlite-ai#689,
test_snapshot_clone.py → boxlite-ai#694, test_images_pull_list.py → boxlite-ai#696.

Remaining 3 cases (test_exec_attach.py, test_volume_readonly.py,
test_cli_detach_recovery.py) stay here because they pin REST-path gaps
that don't have a matching fix PR in this session — they document the
contract for future work to land against.
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds three E2E test modules covering distinct subsystems: CLI detach survival across fresh process invocations, Python SDK execution reattach with error handling, and volume mount isolation verification. All tests use subprocess spawning, async patterns, and fixtures to isolate real system behavior.

Changes

E2E Test Suite Extensions

Layer / File(s) Summary
CLI detach recovery and reusability
scripts/test/e2e/cases/test_cli_detach_recovery.py
Module docstring, imports, constants, and cli fixture establish subprocess-based testing. The test_detached_box_survives_cli_exit_and_is_reusable test (xfail) runs boxlite run -d, captures the UUID, and verifies via fresh CLI calls that boxlite ls and boxlite exec work while checking the runner journal. test_detached_box_exec_propagates_exit_code_on_fresh_cli confirms exit codes propagate correctly through fresh boxlite exec. Both clean up with boxlite rm -f.
SDK reattach and error handling
scripts/test/e2e/cases/test_exec_attach.py
Module docstring and imports introduce async/UUID dependencies. test_reattach_after_original_completes (xfail) runs an exec, collects stdout, reattaches by execution ID, and verifies exit code matching, treating "session gone" as a soft pass. test_attach_with_bogus_id_is_typed_error generates a random ID, asserts the error is typed (no "500"/"Internal" messages), and forces interaction to surface deferred errors.
Host bind mount isolation verification
scripts/test/e2e/cases/test_volume_readonly.py
Test creates a host directory with a marker file, starts a box via REST with BoxOptions(volumes=[(host_dir, "/mnt/ro", True)]), asserts /mnt/ro does not appear in guest /proc/mounts, verifies the marker file is unchanged, and cleans up the box.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • boxlite-ai/boxlite#622: Implemented CLI exec/run exit code propagation (avoiding hard process exit), which is directly tested by the new exit code assertion cases in this PR.

Suggested reviewers

  • DorianZheng

Poem

🐰 Three tests hop in to verify the way,
Detached boxes dance, then reattach to play,
SDK reattaches with grace and care,
Volumes stay hidden, none leaking there,
Fresh processes spawn—the bugs won't dare! 🎪

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: adding three new E2E test case files (cli-detach-recovery, exec-attach, volume-readonly) to the test suite.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

After running the suite end-to-end:

- test_detached_box_survives_cli_exit_and_is_reusable used
  `boxlite info <id>` for per-box state. The CLI's `info` is system-
  wide (version / runtime stats) — there's no per-box info command —
  so the call was always `unexpected argument` exit 2. Replaced step
  3 with `boxlite ls`-row parsing + a state-keyword check. Marked
  xfail(strict=True) because step 3's exec hits the stdout-drop race
  that boxlite-ai#563 fixes (same as the reattach case below).

- test_reattach_after_original_completes already broken by the same
  stdout race — short `echo X && exit 0` returns out=''. Marked
  xfail(strict=True) pointing at boxlite-ai#563.

- test_readonly_volume_mount_flag_and_write_reject expected a host
  bind mount to surface inside the guest. The cloud runtime
  deliberately dropped host bind mounts (boxlite-ai#639 "remove host bind
  mounts; only managed volumes allowed"), so this case was testing a
  feature that doesn't exist by design. Flipped to the negative
  contract — REST callers passing host paths get them silently
  dropped, no /mnt/<x> ever surfaces. Pin: boxlite-ai#639's product direction.

Net: 3 pass + 2 xfail(strict). When boxlite-ai#563 lands, both xfails flip
xpass-strict and become regular passes (markers should be dropped
at that point).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 marked this pull request as ready for review June 10, 2026 04:44
@G4614 G4614 requested a review from a team as a code owner June 10, 2026 04:44
@G4614 G4614 enabled auto-merge June 10, 2026 04:45

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
scripts/test/e2e/cases/test_cli_detach_recovery.py (2)

100-107: ⚖️ Poor tradeoff

Consider the fragility of parsing Unicode table output.

The test parses boxlite ls Unicode table output by searching for state keywords in the row. This will break if the CLI table format changes or if state names evolve. Since this is an E2E test that intentionally exercises the real CLI output, this approach is acceptable, but be aware that CLI output format changes will require test updates.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test/e2e/cases/test_cli_detach_recovery.py` around lines 100 - 107,
The current assertion that checks ls_row for specific state keywords is fragile;
update the check in test_cli_detach_recovery.py to robustly detect the state by
using a regex search (e.g., re.search(r"\b(Running|Started|Ready|Configured)\b",
ls_row)) instead of simple substring checks on ls_row, and on failure include
the full r_ls.stdout/r_ls output in the assertion message to aid debugging;
reference variables: ls_row, r_ls, box_id.

33-36: ⚡ Quick win

Simplify the path calculation.

The current approach navigates up four parent directories to the repository root, then rebuilds the path to lib. Since __file__ is already in scripts/test/e2e/cases/, you can directly reference the sibling lib directory.

♻️ Simplified path calculation
-sys.path.insert(
-    0,
-    str(Path(__file__).resolve().parents[4] / "scripts" / "test" / "e2e" / "lib"),
-)
+sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "lib"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test/e2e/cases/test_cli_detach_recovery.py` around lines 33 - 36, The
sys.path.insert call over-traverses to repo root and then rebuilds the path;
replace that with a direct sibling reference by computing the lib path from the
current file's parent directories: use Path(__file__).resolve().parent.parent /
"lib" (converted to str) and insert that into sys.path instead of using
Path(__file__).resolve().parents[4] / "scripts" / "test" / "e2e" / "lib"; update
the sys.path.insert line that references Path(__file__) accordingly.
scripts/test/e2e/cases/test_exec_attach.py (1)

61-61: ⚖️ Poor tradeoff

Hardcoded sleep for test settling.

The 0.5-second sleep allows runner-side session bookkeeping to flush before reattaching. While this pattern is common in E2E tests and the duration is reasonable, be aware it could introduce flakiness in heavily loaded CI environments. If flakiness occurs, consider polling for a stable condition or increasing the timeout.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test/e2e/cases/test_exec_attach.py` at line 61, The test currently
uses a hardcoded await asyncio.sleep(0.5) which can cause flaky CI; replace this
fixed sleep in the test_exec_attach.py case with a polling wait that checks the
runner/session readiness (e.g., loop with a small interval and a timeout) rather
than sleeping a fixed 0.5s, or extract the timeout into a variable like
attach_wait_timeout and retry until the expected condition (session bookkeeping
flushed / attachable state) is true before proceeding; locate the await
asyncio.sleep(0.5) call and implement the polling/retry around the runner's
API/state used by the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/test/e2e/cases/test_cli_detach_recovery.py`:
- Around line 100-107: The current assertion that checks ls_row for specific
state keywords is fragile; update the check in test_cli_detach_recovery.py to
robustly detect the state by using a regex search (e.g.,
re.search(r"\b(Running|Started|Ready|Configured)\b", ls_row)) instead of simple
substring checks on ls_row, and on failure include the full r_ls.stdout/r_ls
output in the assertion message to aid debugging; reference variables: ls_row,
r_ls, box_id.
- Around line 33-36: The sys.path.insert call over-traverses to repo root and
then rebuilds the path; replace that with a direct sibling reference by
computing the lib path from the current file's parent directories: use
Path(__file__).resolve().parent.parent / "lib" (converted to str) and insert
that into sys.path instead of using Path(__file__).resolve().parents[4] /
"scripts" / "test" / "e2e" / "lib"; update the sys.path.insert line that
references Path(__file__) accordingly.

In `@scripts/test/e2e/cases/test_exec_attach.py`:
- Line 61: The test currently uses a hardcoded await asyncio.sleep(0.5) which
can cause flaky CI; replace this fixed sleep in the test_exec_attach.py case
with a polling wait that checks the runner/session readiness (e.g., loop with a
small interval and a timeout) rather than sleeping a fixed 0.5s, or extract the
timeout into a variable like attach_wait_timeout and retry until the expected
condition (session bookkeeping flushed / attachable state) is true before
proceeding; locate the await asyncio.sleep(0.5) call and implement the
polling/retry around the runner's API/state used by the test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8b5d6e8f-f674-49a7-b098-cf2b255848f0

📥 Commits

Reviewing files that changed from the base of the PR and between 3b0675e and ded1d23.

📒 Files selected for processing (3)
  • scripts/test/e2e/cases/test_cli_detach_recovery.py
  • scripts/test/e2e/cases/test_exec_attach.py
  • scripts/test/e2e/cases/test_volume_readonly.py

@DorianZheng DorianZheng disabled auto-merge June 10, 2026 08:41
@DorianZheng DorianZheng merged commit 04a7f8b into boxlite-ai:main Jun 10, 2026
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants