Skip to content

fix(batch-queue): speed up batch queue processing by disabling cooloff and fixing retry race#3079

Merged
ericallam merged 2 commits intomainfrom
fix/batch-queue-processing
Feb 25, 2026
Merged

fix(batch-queue): speed up batch queue processing by disabling cooloff and fixing retry race#3079
ericallam merged 2 commits intomainfrom
fix/batch-queue-processing

Conversation

@ericallam
Copy link
Member

@ericallam ericallam commented Feb 17, 2026

Fix slow fair queue processing by removing spurious cooloff on concurrency blocks and fixing a race condition where retry attempt counts were not atomically updated during message re-queue.

Removed cooloff entirely from the batch queue

@changeset-bot
Copy link

changeset-bot bot commented Feb 17, 2026

🦋 Changeset detected

Latest commit: d8555e4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 28 packages
Name Type
@trigger.dev/redis-worker Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@trigger.dev/build Patch
@trigger.dev/core Patch
@trigger.dev/python Patch
@trigger.dev/react-hooks Patch
@trigger.dev/rsc Patch
@trigger.dev/schema-to-json Patch
@trigger.dev/sdk Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch
trigger.dev Patch
d3-chat Patch
references-d3-openai-agents Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/redis Patch
@internal/replication Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/tsql Patch
@internal/zod-worker Patch
references-nextjs-realtime Patch
references-realtime-hooks-test Patch
references-realtime-streams Patch
@internal/sdk-compat-tests Patch
references-telemetry Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@ericallam ericallam force-pushed the fix/batch-queue-processing branch from 9ea0ae2 to d275304 Compare February 17, 2026 15:49
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

Walkthrough

Removed per-queue cool-off increment when availableCapacity === 0; retry path now passes the updated message JSON into the visibility/release flow instead of performing a separate HSET; VisibilityManager.release signature and the underlying releaseMessage Redis/Lua command were extended to accept an optional updatedData parameter that, when provided, replaces the in-flight payload; added tests asserting concurrency blocks do not trigger cooloff; disabled cooloff in a FairQueue configuration; increased BATCH_CONCURRENCY_LIMIT_DEFAULT from 1 to 5. No public API removals.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description lacks required template sections including issue reference, testing details, and a proper changelog entry. Add the issue number (Closes #), describe testing steps performed, and provide a detailed changelog entry. Include the checklist completion status.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: disabling cooloff and fixing a retry race condition in batch queue processing.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/batch-queue-processing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@ericallam ericallam force-pushed the fix/batch-queue-processing branch 2 times, most recently from f38f2f9 to ba3a29e Compare February 17, 2026 17:08
@ericallam ericallam changed the title fix: speed up batch queue processing by removing stalls and fixing retry race fix: speed up batch queue processing by disabling cooloff and fixing retry race Feb 17, 2026
@ericallam ericallam marked this pull request as ready for review February 17, 2026 18:17
@ericallam ericallam force-pushed the fix/batch-queue-processing branch from ba3a29e to b9dff7c Compare February 18, 2026 13:30
@ericallam ericallam changed the title fix: speed up batch queue processing by disabling cooloff and fixing retry race fix(batch-queue): speed up batch queue processing by disabling cooloff and fixing retry race Feb 25, 2026
@ericallam ericallam force-pushed the fix/batch-queue-processing branch from b9dff7c to d8555e4 Compare February 25, 2026 17:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts (1)

1264-1275: Consider removing strict wall-clock assertion to reduce CI flakiness.

elapsed < 3000 can fail on loaded runners even when behavior is correct. Prefer asserting via wait timeout below the cooloff period.

♻️ Suggested test stabilization
-        const startTime = Date.now();
         await vi.waitFor(
           () => {
             expect(processed).toHaveLength(3);
           },
-          { timeout: 5000 }
+          { timeout: 4000 }
         );
-        const elapsed = Date.now() - startTime;
-
-        // Should complete well under the 5s cooloff period
-        expect(elapsed).toBeLessThan(3000);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts` around lines
1264 - 1275, The wall-clock assertion using startTime/elapsed is flaky; remove
the Date.now measurement and the expect(elapsed).toBeLessThan(3000) check and
instead rely on vi.waitFor to enforce completion within the desired window by
reducing its timeout (e.g., change vi.waitFor(..., { timeout: 3000 }) or add a
separate vi.waitFor that asserts processed.length becomes 3 with a <3000ms
timeout); update the test around processed, vi.waitFor, startTime/elapsed to
only use vi.waitFor(timeout) and drop the elapsed variables and assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts`:
- Around line 1264-1275: The wall-clock assertion using startTime/elapsed is
flaky; remove the Date.now measurement and the
expect(elapsed).toBeLessThan(3000) check and instead rely on vi.waitFor to
enforce completion within the desired window by reducing its timeout (e.g.,
change vi.waitFor(..., { timeout: 3000 }) or add a separate vi.waitFor that
asserts processed.length becomes 3 with a <3000ms timeout); update the test
around processed, vi.waitFor, startTime/elapsed to only use vi.waitFor(timeout)
and drop the elapsed variables and assertion.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba3a29e and d8555e4.

📒 Files selected for processing (7)
  • .changeset/fix-batch-queue-processing.md
  • .server-changes/batch-queue-perf-fixes.md
  • apps/webapp/app/env.server.ts
  • internal-packages/run-engine/src/batch-queue/index.ts
  • packages/redis-worker/src/fair-queue/index.ts
  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
  • packages/redis-worker/src/fair-queue/visibility.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • apps/webapp/app/env.server.ts
  • .changeset/fix-batch-queue-processing.md
  • packages/redis-worker/src/fair-queue/index.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: sdk-compat / Deno Runtime
🧰 Additional context used
📓 Path-based instructions (10)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • packages/redis-worker/src/fair-queue/visibility.ts
  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
  • internal-packages/run-engine/src/batch-queue/index.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • packages/redis-worker/src/fair-queue/visibility.ts
  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
  • internal-packages/run-engine/src/batch-queue/index.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • packages/redis-worker/src/fair-queue/visibility.ts
  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
  • internal-packages/run-engine/src/batch-queue/index.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • packages/redis-worker/src/fair-queue/visibility.ts
  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
  • internal-packages/run-engine/src/batch-queue/index.ts
{packages,integrations}/**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying public packages in packages/* or integrations/*, add a changeset using pnpm run changeset:add

Files:

  • packages/redis-worker/src/fair-queue/visibility.ts
  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

Import from @trigger.dev/core using subpaths only, never the root

Files:

  • packages/redis-worker/src/fair-queue/visibility.ts
  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
  • internal-packages/run-engine/src/batch-queue/index.ts
**/{src,app}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/{src,app}/**/*.{ts,tsx}: Always import Trigger.dev tasks from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and include a unique id string property

Files:

  • packages/redis-worker/src/fair-queue/visibility.ts
  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
  • internal-packages/run-engine/src/batch-queue/index.ts
**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.{ts,tsx,js,jsx}: Test files should live beside the files under test and use descriptive describe and it blocks
Tests should avoid mocks or stubs and use the helpers from @internal/testcontainers when Redis or Postgres are needed
Use vitest for running unit tests

Files:

  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
**/*.test.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.test.{ts,tsx,js}: Never mock anything in tests - use testcontainers instead for Redis and PostgreSQL
Test files should be placed next to source files (e.g., MyService.tsMyService.test.ts)
Use vitest exclusively for testing

Files:

  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
🧠 Learnings (2)
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option

Applied to files:

  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
📚 Learning: 2026-01-12T17:18:09.451Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2870
File: apps/webapp/app/services/redisConcurrencyLimiter.server.ts:56-66
Timestamp: 2026-01-12T17:18:09.451Z
Learning: In `apps/webapp/app/services/redisConcurrencyLimiter.server.ts`, the query concurrency limiter will not be deployed with Redis Cluster mode, so multi-key operations (keyKey and globalKey in different hash slots) are acceptable and will function correctly in standalone Redis mode.

Applied to files:

  • packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts
🔇 Additional comments (4)
internal-packages/run-engine/src/batch-queue/index.ts (1)

155-157: Cooloff disablement is correctly applied.

This matches the PR intent and avoids unnecessary queue throttling on concurrency blocks.

packages/redis-worker/src/fair-queue/visibility.ts (1)

281-312: Atomic release payload update looks correct.

The updatedData plumbing through TypeScript + Lua is consistent and properly keeps release + data mutation in one atomic operation.

Also applies to: 686-699, 820-831

packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts (1)

1186-1283: Great coverage for the concurrency-vs-cooloff regression.

This test directly exercises the failure mode fixed by the PR and is a strong guardrail.

.server-changes/batch-queue-perf-fixes.md (1)

1-10: Changelog entry is clear and aligned with the implementation.

The note accurately communicates the performance-oriented behavior change and new plan limits.

@ericallam ericallam merged commit bed3789 into main Feb 25, 2026
37 checks passed
@ericallam ericallam deleted the fix/batch-queue-processing branch February 25, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants