Deflake CI tests#2139
Open
cconstable wants to merge 4 commits into
Open
Conversation
e6db15b to
b3e9d60
Compare
b3e9d60 to
5ae34a2
Compare
chris-olszewski
requested changes
Jun 24, 2026
chris-olszewski
left a comment
Member
There was a problem hiding this comment.
Test splitting is great, unsure about the other 2 changes.
| registeredActivityNames, | ||
| logger, | ||
| }: ThreadedVMWorkflowCreatorOptions): Promise<ThreadedVMWorkflowCreator> { | ||
| const maxHeapMb = Number(process.env.TEMPORAL_WORKER_THREAD_MAX_HEAP_MB); |
Member
There was a problem hiding this comment.
A little hesitant about having a public facing escape hatch. People/AI can/will find and use this. I would prefer if we keep this PR focused on testing/CI and not alter published packages.
Comment on lines
+50
to
+55
| * The header line of a stack trace (`<ErrorName>: <message>`) is the most engine- and version-dependent | ||
| * part (V8, JSC and Deno all render it differently; JSC may even drop the name and message), and it is | ||
| * redundant with the callers' own `instanceof`/`message` assertions. We therefore match it loosely and | ||
| * only assert the meaningful call frames. This applies only to multi-line stacks; single-line values | ||
| * (e.g. a function name compared against `$CLASS.all`) are matched as-is. | ||
| * |
Member
There was a problem hiding this comment.
Not super comfortable with making these assertions less strict. They have caught regressions in the past. We test against a set matrix of these engines, is there a specific one that was flaking?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
tl;dr the way we concurrently run tests can intermittently OOM runners and we have tests that compare stack traces (which can change).
What changed
Split
test-integration-workflows.ts(55 tests) into 6 themed files, dropping peak within-file concurrency from 53 to ≤12:test-integration-cancellation-scopes.ts(12)test-integration-activities.ts(9)test-integration-workflow-start.ts(9)test-integration-workflow-info.ts(12)test-integration-replay-and-flags.ts(8)test-integration-reserved-prefixes.ts(5)Shared workflow/activity/interceptor definitions moved to
test-integration-workflows-common.ts, which each file re-exports (export *) so every file's bundle stays identical to theoriginal (
workflowsPath: __filename+ dynamic interceptor dispatch unchanged).Heap usage guardrail:
threaded-vm.tsreadsTEMPORAL_WORKER_THREAD_MAX_HEAP_MBand applies it as the worker thread'sresourceLimits.maxOldGenerationSizeMb, so asingle runaway Workflow fails fast with a clear error instead of dragging the process into memory pressure. CI sets it to
1024for Node jobs; unset (no cap) in production.Stack trace tests now ignore the "header" line of the stack trace since different engines output different lines.
More details
Integration-test CI jobs intermittently failed with
ERR_WORKER_OUT_OF_MEMORY/ AVA timeouts (and a native Bun crash) even when assertions passed. Root cause: AVA'sconcurrency: 1only serializes test files — within a file all non-serial tests run at once.
test-integration-workflows.tsran 53 tests concurrently, each spinning up its own Worker (workerthread + V8 context), exhausting memory.
Profiling revealed that peak RSS scales ~linearly with concurrently-live Workers (~47 MB each; ~334 MB at 1, ~1.8 GB at 32, with trivial workflows).
Verification
All 55 tests pass locally across the 6 files (including the time-skipping/interceptor test).
eslint/prettierclean;@temporalio/workerand@temporalio/testbuild.