Skip to content

fix(agents/playbook): correctness pass on engine evaluation + flattening#129

Merged
beenuar merged 3 commits into
mainfrom
fix/playbook-engine-correctness
May 14, 2026
Merged

fix(agents/playbook): correctness pass on engine evaluation + flattening#129
beenuar merged 3 commits into
mainfrom
fix/playbook-engine-correctness

Conversation

@beenuar
Copy link
Copy Markdown
Owner

@beenuar beenuar commented May 14, 2026

Summary

Five silent-correctness bugs in services/agents/app/playbook/engine.py were producing wrong-but-passing playbook runs. None failed loudly, so existing happy-path tests didn't catch them.

Bugs fixed in engine.py

  1. Expression-string conditions silently always-true. expression: \"x > 1\" form was accepted by the schema but the evaluator only matched the structured field/operator/value shape, so any expression-string condition fell through to True. Now routed through a new _evaluate_expression that parses <lhs> <op> <rhs> (with is null / is not null sugar) against the run context.
  2. gt / lt crashed on non-numeric values. Direct > / < on mixed types raised TypeError. Added _to_float coercion that returns None on unparseable input, so the comparison cleanly evaluates to False instead of bombing the run.
  3. contains was substring-only. Documented playbook intent supports both substring (str haystack) and list-membership (list/tuple haystack); now does the right thing for both.
  4. _resolve_field couldn't index lists. Paths like entities.0.name bailed at the first non-dict node. Now traverses list/tuple indices alongside dict keys.
  5. Context flattening clobbered later step results. PlaybookEngine.run used pr.context.setdefault(k, v), so a later step's output couldn't override an earlier step's key in the flat context. Switched to direct assignment; the per-step _step_<id> namespaced snapshot is still preserved for auditability.

Tests

Adds services/agents/tests/test_playbook_engine_correctness.py — 61 tests across seven classes covering:

  • _resolve_field list/tuple index traversal
  • _to_float coercion (None, empty string, bool, numeric strings, bogus input)
  • Structured condition evaluation (eq/ne/exists/not_exists/contains/gt/lt + list membership + null handling)
  • Expression-form condition evaluation
  • _evaluate_expression edge cases (literals, dot-paths, is null / is not null)
  • PlaybookEngine.run context flattening override semantics
  • The engine's condition-gated step-skip path (handler not invoked, StepStatus.SKIPPED recorded)

_emit is monkeypatched per-test via fixture to avoid hitting _REALTIME_URL during tests.

Out of scope

Bounds + SSRF guards land separately on a later branch (those files don't exist on this branch's base).

Test plan

  • poetry run pytest tests/test_playbook_engine_correctness.py — 61 passed
  • poetry run pytest tests/ -k playbook --ignore=tests/test_explain_endpoint.py — 69 passed (no regressions in existing playbook tests; the ignored file has a pre-existing unrelated asyncpg import drift on this branch)
  • CI green on PR

Made with Cursor

Beenu Arora added 3 commits May 14, 2026 18:23
Five silent-correctness bugs in services/agents/app/playbook/engine.py were
producing wrong-but-passing playbook runs. None changed behavior loudly, so
they were not caught by existing happy-path tests.

Fixes:
- _evaluate_condition: expression-string conditions (`expression: "x > 1"`)
  were silently treated as always-true; now routed through a new
  _evaluate_expression that parses `<lhs> <op> <rhs>` (plus `is null` /
  `is not null` sugar) against the run context.
- gt/lt: previously crashed on non-numeric values; now coerce via _to_float
  and return False on unparseable input rather than raising.
- contains: now supports both substring (str haystack) and list-membership
  (list/tuple haystack), matching documented playbook intent.
- _resolve_field: now traverses list/tuple indices (e.g. entities.0.name)
  instead of bailing on the first non-dict node.
- PlaybookEngine.run context flattening: switched from setdefault to direct
  assignment so later step results can override earlier context keys, while
  preserving the per-step `_step_<id>` namespaced snapshot for auditability.

Adds services/agents/tests/test_playbook_engine_correctness.py (61 tests,
seven classes) covering _resolve_field list traversal, _to_float coercion,
structured + expression-form condition evaluation, _evaluate_expression
edge cases, context flattening override semantics, and the engine's
condition-gated step-skip path.

Out of scope: bounds + SSRF guards land separately on a later branch.
Ruff's isort treats `app.*` as first-party in this repo, so the
`from app.playbook ...` imports belong in the same group as
`from app.playbook.models ...` rather than separated by a blank
line. Fixes the lint job on PR #129; no behavior change.
Three multi-line async def signatures fit within line-length once joined.
Fixes the ruff format --check step on PR #129; no behavior change.
@beenuar beenuar merged commit 987d3bc into main May 14, 2026
23 checks passed
@beenuar beenuar deleted the fix/playbook-engine-correctness branch May 14, 2026 10:35
beenuar added a commit that referenced this pull request May 14, 2026
Brings every cross-cutting doc surface in line with the 21 PRs that
landed on `main` on 2026-05-14, anchored by the v8.0 architectural
foundation (PR #125) and the security + correctness wave that
followed it.

- `CHANGELOG.md` — new `[Unreleased]` block covering the v8.0
  architectural foundation (graph at ingest, four-agent rebrand,
  `/hunt`, sixteen connectors, automation maturity, public
  scoreboard), the eight-PR security hardening wave (PRs #116-#128),
  the three-PR CodeQL alert sweep to zero (#133, #136, #137), the
  UEBA env-var alignment (PR #135, first community contribution,
  closes #134), the security-smoke + UX cleanup pair (PR #132,
  closes #131 + #130), and the playbook engine correctness pass
  (PR #129).
- `README.md` — new `v8.0 wave-1 (on main, not yet tagged)` entry
  in the version-history section; `Next` block rewritten as
  `v8.0 wave-2` with the still-`[~]` items from
  `AISOC_V8_PROGRESS.md`. Version badge intentionally not bumped
  (still 7.3.1) because wave-1 is on `main` but not tagged.
- `AGENTS.md` — new `v8.0 wave-1` block under "Learned Workspace
  Facts" documenting the four-agent topology, `/hunt` surface,
  connector inventory, automation maturity ladder, security wave
  outcomes, CodeQL hygiene patterns (inline `replace`-chain
  sanitisation for `py/log-injection`, single import style for
  `py/import-and-import-from`), and the UEBA env-var dual-alias
  convention.
- `AISOC_V8_PROGRESS.md` — `Status` block refreshed to record that
  PR #125 shipped at `b854010e` on 2026-05-14, list the 12
  post-merge PRs that landed on `main` after it, and clarify that
  wave-2 is the still-tracked `[~]` work.
- `apps/docs/docs/deployment/env-vars.md` — UEBA section rewritten
  around the dual-alias rule (unprefixed wins over `UEBA_`-prefixed,
  matches every other Python service and the `docker-compose.yml`
  exports); table now lists canonical + legacy names side by side.
- `apps/docs/docs/operations/security.md` — new `Static analysis
  (CodeQL)` section: zero alerts on `main` as a CI gate, plus the
  two patterns that came up repeatedly during the sweep
  (inline-at-call-site sanitisation for `py/log-injection`, single
  import style for `py/import-and-import-from`).

No code changes; pure documentation sync.

Co-authored-by: Beenu Arora <beenu@cyble.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant