fix(agents/playbook): correctness pass on engine evaluation + flattening#129
Merged
Conversation
added 3 commits
May 14, 2026 18:23
Five silent-correctness bugs in services/agents/app/playbook/engine.py were producing wrong-but-passing playbook runs. None changed behavior loudly, so they were not caught by existing happy-path tests. Fixes: - _evaluate_condition: expression-string conditions (`expression: "x > 1"`) were silently treated as always-true; now routed through a new _evaluate_expression that parses `<lhs> <op> <rhs>` (plus `is null` / `is not null` sugar) against the run context. - gt/lt: previously crashed on non-numeric values; now coerce via _to_float and return False on unparseable input rather than raising. - contains: now supports both substring (str haystack) and list-membership (list/tuple haystack), matching documented playbook intent. - _resolve_field: now traverses list/tuple indices (e.g. entities.0.name) instead of bailing on the first non-dict node. - PlaybookEngine.run context flattening: switched from setdefault to direct assignment so later step results can override earlier context keys, while preserving the per-step `_step_<id>` namespaced snapshot for auditability. Adds services/agents/tests/test_playbook_engine_correctness.py (61 tests, seven classes) covering _resolve_field list traversal, _to_float coercion, structured + expression-form condition evaluation, _evaluate_expression edge cases, context flattening override semantics, and the engine's condition-gated step-skip path. Out of scope: bounds + SSRF guards land separately on a later branch.
Ruff's isort treats `app.*` as first-party in this repo, so the `from app.playbook ...` imports belong in the same group as `from app.playbook.models ...` rather than separated by a blank line. Fixes the lint job on PR #129; no behavior change.
Three multi-line async def signatures fit within line-length once joined. Fixes the ruff format --check step on PR #129; no behavior change.
This was referenced May 14, 2026
beenuar
added a commit
that referenced
this pull request
May 14, 2026
Brings every cross-cutting doc surface in line with the 21 PRs that landed on `main` on 2026-05-14, anchored by the v8.0 architectural foundation (PR #125) and the security + correctness wave that followed it. - `CHANGELOG.md` — new `[Unreleased]` block covering the v8.0 architectural foundation (graph at ingest, four-agent rebrand, `/hunt`, sixteen connectors, automation maturity, public scoreboard), the eight-PR security hardening wave (PRs #116-#128), the three-PR CodeQL alert sweep to zero (#133, #136, #137), the UEBA env-var alignment (PR #135, first community contribution, closes #134), the security-smoke + UX cleanup pair (PR #132, closes #131 + #130), and the playbook engine correctness pass (PR #129). - `README.md` — new `v8.0 wave-1 (on main, not yet tagged)` entry in the version-history section; `Next` block rewritten as `v8.0 wave-2` with the still-`[~]` items from `AISOC_V8_PROGRESS.md`. Version badge intentionally not bumped (still 7.3.1) because wave-1 is on `main` but not tagged. - `AGENTS.md` — new `v8.0 wave-1` block under "Learned Workspace Facts" documenting the four-agent topology, `/hunt` surface, connector inventory, automation maturity ladder, security wave outcomes, CodeQL hygiene patterns (inline `replace`-chain sanitisation for `py/log-injection`, single import style for `py/import-and-import-from`), and the UEBA env-var dual-alias convention. - `AISOC_V8_PROGRESS.md` — `Status` block refreshed to record that PR #125 shipped at `b854010e` on 2026-05-14, list the 12 post-merge PRs that landed on `main` after it, and clarify that wave-2 is the still-tracked `[~]` work. - `apps/docs/docs/deployment/env-vars.md` — UEBA section rewritten around the dual-alias rule (unprefixed wins over `UEBA_`-prefixed, matches every other Python service and the `docker-compose.yml` exports); table now lists canonical + legacy names side by side. - `apps/docs/docs/operations/security.md` — new `Static analysis (CodeQL)` section: zero alerts on `main` as a CI gate, plus the two patterns that came up repeatedly during the sweep (inline-at-call-site sanitisation for `py/log-injection`, single import style for `py/import-and-import-from`). No code changes; pure documentation sync. Co-authored-by: Beenu Arora <beenu@cyble.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five silent-correctness bugs in
services/agents/app/playbook/engine.pywere producing wrong-but-passing playbook runs. None failed loudly, so existing happy-path tests didn't catch them.Bugs fixed in
engine.pyexpression: \"x > 1\"form was accepted by the schema but the evaluator only matched the structuredfield/operator/valueshape, so any expression-string condition fell through toTrue. Now routed through a new_evaluate_expressionthat parses<lhs> <op> <rhs>(withis null/is not nullsugar) against the run context.gt/ltcrashed on non-numeric values. Direct>/<on mixed types raisedTypeError. Added_to_floatcoercion that returnsNoneon unparseable input, so the comparison cleanly evaluates toFalseinstead of bombing the run.containswas substring-only. Documented playbook intent supports both substring (str haystack) and list-membership (list/tuple haystack); now does the right thing for both._resolve_fieldcouldn't index lists. Paths likeentities.0.namebailed at the first non-dict node. Now traverses list/tuple indices alongside dict keys.PlaybookEngine.runusedpr.context.setdefault(k, v), so a later step's output couldn't override an earlier step's key in the flat context. Switched to direct assignment; the per-step_step_<id>namespaced snapshot is still preserved for auditability.Tests
Adds
services/agents/tests/test_playbook_engine_correctness.py— 61 tests across seven classes covering:_resolve_fieldlist/tuple index traversal_to_floatcoercion (None, empty string, bool, numeric strings, bogus input)_evaluate_expressionedge cases (literals, dot-paths,is null/is not null)PlaybookEngine.runcontext flattening override semanticsStepStatus.SKIPPEDrecorded)_emitis monkeypatched per-test via fixture to avoid hitting_REALTIME_URLduring tests.Out of scope
Bounds + SSRF guards land separately on a later branch (those files don't exist on this branch's base).
Test plan
poetry run pytest tests/test_playbook_engine_correctness.py— 61 passedpoetry run pytest tests/ -k playbook --ignore=tests/test_explain_endpoint.py— 69 passed (no regressions in existing playbook tests; the ignored file has a pre-existing unrelatedasyncpgimport drift on this branch)Made with Cursor