fix(agents/playbook): correctness pass on engine evaluation + flattening by beenuar · Pull Request #129 · beenuar/AiSOC

beenuar · 2026-05-14T10:24:20Z

Summary

Five silent-correctness bugs in services/agents/app/playbook/engine.py were producing wrong-but-passing playbook runs. None failed loudly, so existing happy-path tests didn't catch them.

Bugs fixed in `engine.py`

Expression-string conditions silently always-true. expression: \"x > 1\" form was accepted by the schema but the evaluator only matched the structured field/operator/value shape, so any expression-string condition fell through to True. Now routed through a new _evaluate_expression that parses <lhs> <op> <rhs> (with is null / is not null sugar) against the run context.
gt / lt crashed on non-numeric values. Direct > / < on mixed types raised TypeError. Added _to_float coercion that returns None on unparseable input, so the comparison cleanly evaluates to False instead of bombing the run.
contains was substring-only. Documented playbook intent supports both substring (str haystack) and list-membership (list/tuple haystack); now does the right thing for both.
_resolve_field couldn't index lists. Paths like entities.0.name bailed at the first non-dict node. Now traverses list/tuple indices alongside dict keys.
Context flattening clobbered later step results. PlaybookEngine.run used pr.context.setdefault(k, v), so a later step's output couldn't override an earlier step's key in the flat context. Switched to direct assignment; the per-step _step_<id> namespaced snapshot is still preserved for auditability.

Tests

Adds services/agents/tests/test_playbook_engine_correctness.py — 61 tests across seven classes covering:

_resolve_field list/tuple index traversal
_to_float coercion (None, empty string, bool, numeric strings, bogus input)
Structured condition evaluation (eq/ne/exists/not_exists/contains/gt/lt + list membership + null handling)
Expression-form condition evaluation
_evaluate_expression edge cases (literals, dot-paths, is null / is not null)
PlaybookEngine.run context flattening override semantics
The engine's condition-gated step-skip path (handler not invoked, StepStatus.SKIPPED recorded)

_emit is monkeypatched per-test via fixture to avoid hitting _REALTIME_URL during tests.

Out of scope

Bounds + SSRF guards land separately on a later branch (those files don't exist on this branch's base).

Test plan

poetry run pytest tests/test_playbook_engine_correctness.py — 61 passed
poetry run pytest tests/ -k playbook --ignore=tests/test_explain_endpoint.py — 69 passed (no regressions in existing playbook tests; the ignored file has a pre-existing unrelated asyncpg import drift on this branch)
CI green on PR

Made with Cursor

Five silent-correctness bugs in services/agents/app/playbook/engine.py were producing wrong-but-passing playbook runs. None changed behavior loudly, so they were not caught by existing happy-path tests. Fixes: - _evaluate_condition: expression-string conditions (`expression: "x > 1"`) were silently treated as always-true; now routed through a new _evaluate_expression that parses `<lhs> <op> <rhs>` (plus `is null` / `is not null` sugar) against the run context. - gt/lt: previously crashed on non-numeric values; now coerce via _to_float and return False on unparseable input rather than raising. - contains: now supports both substring (str haystack) and list-membership (list/tuple haystack), matching documented playbook intent. - _resolve_field: now traverses list/tuple indices (e.g. entities.0.name) instead of bailing on the first non-dict node. - PlaybookEngine.run context flattening: switched from setdefault to direct assignment so later step results can override earlier context keys, while preserving the per-step `_step_<id>` namespaced snapshot for auditability. Adds services/agents/tests/test_playbook_engine_correctness.py (61 tests, seven classes) covering _resolve_field list traversal, _to_float coercion, structured + expression-form condition evaluation, _evaluate_expression edge cases, context flattening override semantics, and the engine's condition-gated step-skip path. Out of scope: bounds + SSRF guards land separately on a later branch.

Ruff's isort treats `app.*` as first-party in this repo, so the `from app.playbook ...` imports belong in the same group as `from app.playbook.models ...` rather than separated by a blank line. Fixes the lint job on PR #129; no behavior change.

Three multi-line async def signatures fit within line-length once joined. Fixes the ruff format --check step on PR #129; no behavior change.

Brings every cross-cutting doc surface in line with the 21 PRs that landed on `main` on 2026-05-14, anchored by the v8.0 architectural foundation (PR #125) and the security + correctness wave that followed it. - `CHANGELOG.md` — new `[Unreleased]` block covering the v8.0 architectural foundation (graph at ingest, four-agent rebrand, `/hunt`, sixteen connectors, automation maturity, public scoreboard), the eight-PR security hardening wave (PRs #116-#128), the three-PR CodeQL alert sweep to zero (#133, #136, #137), the UEBA env-var alignment (PR #135, first community contribution, closes #134), the security-smoke + UX cleanup pair (PR #132, closes #131 + #130), and the playbook engine correctness pass (PR #129). - `README.md` — new `v8.0 wave-1 (on main, not yet tagged)` entry in the version-history section; `Next` block rewritten as `v8.0 wave-2` with the still-`[~]` items from `AISOC_V8_PROGRESS.md`. Version badge intentionally not bumped (still 7.3.1) because wave-1 is on `main` but not tagged. - `AGENTS.md` — new `v8.0 wave-1` block under "Learned Workspace Facts" documenting the four-agent topology, `/hunt` surface, connector inventory, automation maturity ladder, security wave outcomes, CodeQL hygiene patterns (inline `replace`-chain sanitisation for `py/log-injection`, single import style for `py/import-and-import-from`), and the UEBA env-var dual-alias convention. - `AISOC_V8_PROGRESS.md` — `Status` block refreshed to record that PR #125 shipped at `b854010e` on 2026-05-14, list the 12 post-merge PRs that landed on `main` after it, and clarify that wave-2 is the still-tracked `[~]` work. - `apps/docs/docs/deployment/env-vars.md` — UEBA section rewritten around the dual-alias rule (unprefixed wins over `UEBA_`-prefixed, matches every other Python service and the `docker-compose.yml` exports); table now lists canonical + legacy names side by side. - `apps/docs/docs/operations/security.md` — new `Static analysis (CodeQL)` section: zero alerts on `main` as a CI gate, plus the two patterns that came up repeatedly during the sweep (inline-at-call-site sanitisation for `py/log-injection`, single import style for `py/import-and-import-from`). No code changes; pure documentation sync. Co-authored-by: Beenu Arora <beenu@cyble.com>

Beenu Arora added 3 commits May 14, 2026 18:23

style(agents/tests): ruff format collapse on playbook engine tests

419852f

Three multi-line async def signatures fit within line-length once joined. Fixes the ruff format --check step on PR #129; no behavior change.

beenuar merged commit 987d3bc into main May 14, 2026
23 checks passed

beenuar deleted the fix/playbook-engine-correctness branch May 14, 2026 10:35

This was referenced May 14, 2026

feat(v8.0): graph at ingest, four-agent rebrand, /hunt, 16 connectors, public benchmark #125

Merged

docs: sync project documentation with today's v8.0 wave-1 push #138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agents/playbook): correctness pass on engine evaluation + flattening#129

fix(agents/playbook): correctness pass on engine evaluation + flattening#129
beenuar merged 3 commits into
mainfrom
fix/playbook-engine-correctness

beenuar commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

beenuar commented May 14, 2026

Summary

Bugs fixed in engine.py

Tests

Out of scope

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bugs fixed in `engine.py`