Skip to content

test(agentic-ai): add event handling regression e2e test#7072

Open
maff wants to merge 5 commits intomainfrom
agentic-ai-event-handling-regression-e2e-test
Open

test(agentic-ai): add event handling regression e2e test#7072
maff wants to merge 5 commits intomainfrom
agentic-ai-event-handling-regression-e2e-test

Conversation

@maff
Copy link
Copy Markdown
Member

@maff maff commented Apr 28, 2026

Description

Adds an e2e regression test suite for the agentic-AI ad-hoc sub-process (AHSP) event-handling contract, designed to catch a bug introduced in Camunda 8.9.1 where inner-instance variables (notably toolCall / toolCallResult set by the AI Agent on tool activations) leak out of the inner-instance scope into the surrounding AHSP/root scope.

L4JAiAgentJobWorkerEventsTests runs on a single BPMN — agentic-ai-ahsp-connectors-event.bpmn — that mirrors the realistic connectors tool setup (SuperfluxProduct, Search_The_Web, etc.) and the user-feedback satisfaction loop, plus a Pending_Tool service task whose job tests intentionally hold to keep the AHSP open while events are published. Per-test message correlation isolation via a UUID-driven eventCorrelationKey.

Scenarios:

  • Event message published before AHSP activation (buffered, correlates on entry).
  • Event during in-flight tool execution with WAIT_FOR_TOOL_CALL_RESULTS (default).
  • Event during tool execution with INTERRUPT_TOOL_CALLS ("Cancel tool calls").
  • Event with empty payload — agent inserts the synthetic placeholder UserMessage (one variant per behavior, since the synthetic message text differs).
  • Multiple events in a single AHSP iteration — published order preserved in the chat request.

Each scenario asserts the chat conversation, agent metrics and response text, the user-feedback worker firing exactly once, and the leak invariant (toolCall must not exist at the process-instance root scope).

In addition:

  • agentic-ai-ahsp-connectors.bpmn: SuperfluxProduct writes its script result to an intermediate variable, and an <zeebe:output> mapping projects it to toolCallResult. The presence of the output mapping is what triggers the regression code path on 8.9.1 — without it, the buggy parent+local merge in BpmnVariableMappingBehavior is never reached. Same change is mirrored in agentic-ai-ahsp-connectors-event.bpmn.
  • L4JAiAgentJobWorkerToolCallingTests.executesAgentWithToolCallingAndUserFeedback already had a defense-in-depth leak check; it was previously a no-op (the connectors BPMN's tools had no output mappings), and is now a real regression detector thanks to the SuperfluxProduct output mapping.

Verification:

engine fix image 8.9.1
L4JAiAgentJobWorkerEventsTests (6 tests) ✅ all pass ❌ all 6 fail (leak detected or process never completes)
L4JAiAgentJobWorkerToolCallingTests.executesAgentWithToolCallingAndUserFeedback ✅ pass ❌ fail (leak detected)

Note on 8.9.0

8.9.0 is not a stable verification target for this suite when run as a class. Individual tests pass, but in suite mode 3/6 tests time out due to a known CPT/gateway interaction (camunda/camunda#45177, camunda/camunda#45667): when CPT recreates the CamundaClient between tests, REST long-poll job-activation requests aren't cleanly cancelled — the gateway delivers next-test jobs to the dead connection, the new test's worker polls and gets nothing, and the job sits locked until the activation timeout (~60s) expires.

8.9.0 shipped a partial workaround (camunda/camunda#49836 — force gRPC + long-polling). The proper fix landed in 8.9.1 (camunda/camunda#49424 — cancel pending REST long-polls on cluster purge) along with related cleanup of HTTP connection-pool shutdown. The engine fix image (built on 8.9.1+) doesn't exhibit the lag, so it remains the meaningful verification target alongside 8.9.1 (which fails fast on the AHSP regression assertion).

Related issues

Adds e2e coverage for the regression tracked in camunda/camunda#51939.

Checklist

  • Backport labels are added if these code changes should be backported. No backport label is added to the latest
    release, as this branch will be rebased onto main before the next release. Example backport labels:
    • backport stable/8.8: for changes that should be included in the next 8.8.x release.
    • or backport release-8.8.7: for changes that should be included in the specific release 8.8.7, and this
      release has already been created. The release branch will be merged back into stable/8.8 later, so the change
      will be included in future 8.8.x releases as well.
  • Tests/Integration tests for the changes have been added if applicable.
  • If the change requires a documentation update, it has been added to the appropriate section in the documentation.

@maff maff self-assigned this Apr 28, 2026
@maff maff changed the title Agentic AI: add event handling regression e2e test test(agentic-ai): add event handling regression e2e test Apr 28, 2026
@maff maff marked this pull request as ready for review April 28, 2026 18:35
Copilot AI review requested due to automatic review settings April 28, 2026 18:35
@maff maff requested review from a team as code owners April 28, 2026 18:35
@maff maff requested a review from ztefanie April 28, 2026 18:35
@maff
Copy link
Copy Markdown
Member Author

maff commented Apr 28, 2026

ℹ️ Note: this will fail e2e until using Camunda 8.9.2

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an e2e regression test suite in the Agentic-AI connector E2E module to validate the AHSP event-handling contract and detect the Camunda 8.9.1 variable-scope leak regression (inner-instance toolCall leaking to root). It also adjusts the shared connectors BPMN models to ensure the regression path is actually exercised (via <zeebe:output> mappings).

Changes:

  • Add L4JAiAgentJobWorkerEventsTests covering buffered events, events during tool execution (wait vs interrupt), empty payload behavior, and multi-event ordering.
  • Update the AHSP connectors BPMN models to include an output mapping on SuperfluxProduct (and revise the event BPMN to include a “Pending Tool” job + correlation-key-driven message subscription).
  • Add a shared assertNoToolCallVariableLeak(...) helper and wire it into existing tool-calling coverage.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/resources/agentic-ai-ahsp-connectors.bpmn Adds <zeebe:output> mapping for SuperfluxProduct to trigger the regression code path under test.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/resources/agentic-ai-ahsp-connectors-event.bpmn Reworks the event-focused BPMN to support deterministic event publication/correlation and “in-flight tool” scenarios.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../L4JAiAgentJobWorkerToolCallingTests.java Ensures existing tool-calling regression test asserts the variable leak invariant.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../L4JAiAgentJobWorkerEventsTests.java Introduces the new event-handling regression E2E test suite.
connectors-e2e-test/connectors-e2e-test-agentic-ai/src/test/java/.../BaseL4JAiAgentJobWorkerTest.java Adds assertNoToolCallVariableLeak(...) helper using variable search to detect root-scope leaks.

@maff maff removed the request for review from ztefanie April 29, 2026 06:17
@maff maff force-pushed the agentic-ai-event-handling-regression-e2e-test branch from b9ac9be to 3e87c47 Compare April 29, 2026 06:17
@maff maff added the e2e-tests label Apr 30, 2026
maff and others added 5 commits April 30, 2026 09:35
New `L4JAiAgentJobWorkerEventSubprocessTests` covers the agentic-AI
sub-process event-handling contract (event before activation, during
tool execution with WAIT_FOR_TOOL_CALL_RESULTS / INTERRUPT_TOOL_CALLS /
empty payload, plus a no-event control). Each scenario asserts the
expected chat conversation and that `toolCall` does not leak to the
root scope — the regression tracked by camunda/camunda#51939.

Tests pass on 8.9.0 and the engine fix image, fail on 8.9.1.
Merges `L4JAiAgentJobWorkerEventSubprocessTests` (regression-focused,
minimal BPMN) and the now-deleted `L4JAiAgentJobWorkerEventsTests`
(realistic, connectors BPMN) into a single `L4JAiAgentJobWorkerEventsTests`
running on the unified `agentic-ai-ahsp-connectors-event.bpmn`. Adds a
`Pending_Tool` service task as a deterministic in-AHSP park point, gives
the message subscription a variable-driven correlation key for per-test
isolation, and adds an output mapping to `SuperfluxProduct` (in both
`connectors.bpmn` and `connectors-event.bpmn`) so tool execution always
exercises the regression-sensitive code path.

Coverage: event before activation, event during execution
(WAIT_FOR_TOOL_CALL_RESULTS / INTERRUPT_TOOL_CALLS, with payload and
empty), and multi-event ordering. Each scenario asserts the chat
conversation, agent metrics and response text, the user-feedback worker
firing once, and that `toolCall` does not leak to the root scope —
camunda/camunda#51939. The defense-in-depth leak check in
`L4JAiAgentJobWorkerToolCallingTests` is now a real regression detector
thanks to the SuperfluxProduct output mapping.

All tests pass on the engine fix image and fail on 8.9.1.
Drop the hand-rolled `awaitPendingToolJobCreated` / `awaitEventSubprocessCompletions`
search-request loops in favor of `CamundaAssert.assertThat(...).hasActiveElements(...)`
and `.hasCompletedElement(elementId, times)`. Same semantics (the latter waits and
fails if not exactly the given count), shorter helpers, no Awaitility dependency in
the test class.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@maff maff force-pushed the agentic-ai-event-handling-regression-e2e-test branch from 3e87c47 to 75f63f2 Compare April 30, 2026 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants