PRD: Scheduled Agent Triggers (Cron Jobs) #1612

amikofalvy · 2026-01-27T21:54:59Z

Summary

This PRD defines a new feature for time-based agent triggers, enabling:

Recurring schedules via standard 5-field cron expressions
One-time delayed execution for deferred tasks
Exactly-once execution guarantees using Postgres advisory locks
Retry with exponential backoff for transient failures
Full observability with trace linkage, execution history, and debugging

Key Design Decisions

Internal scheduler over Vercel Cron - Vercel cron requires static vercel.json config, unsuitable for dynamic multi-tenant schedules
Postgres-backed durable queues using existing @workflow/* infrastructure
UTC-only scheduling for v1 (no timezone/DST complexity)

Architecture Diagrams Included

The PRD contains 13 Mermaid diagrams covering:

High-level system overview
Scheduler worker architecture
Invocation state machine
Database entity relationships
Exactly-once execution guarantees
Retry flow with exponential backoff
Leader election
User journey flows

Surfaces Impacted

API (manage + run domains)
TypeScript SDK (scheduledTrigger() builder)
CLI (pull/push support)
Manage UI (trigger forms, invocation history)
Observability (new trace spans)

Open Questions for Review

Missed execution handling (catch-up vs skip)
Execution overlap policy (queue vs skip)
Alert webhook retry logic
Natural language parsing approach (LLM vs regex)

🤖 Generated with Claude Code

This PRD defines time-based agent triggers with: - Recurring schedules via cron expressions - One-time delayed execution - Exactly-once execution guarantees - Retry with exponential backoff - Full observability integration - Multi-surface support (API, UI, SDK, CLI) Includes comprehensive architecture diagrams for review. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

changeset-bot · 2026-01-27T21:55:03Z

⚠️ No Changeset found

Latest commit: cc8a5c3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

vercel · 2026-01-27T21:55:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
agents-api	Ready	Preview, Comment	Jan 27, 2026 10:00pm
agents-docs	Ready	Preview, Comment	Jan 27, 2026 10:00pm
agents-manage-ui	Ready	Preview, Comment	Jan 27, 2026 10:00pm

claude · 2026-01-27T21:55:26Z

Claude finished @amikofalvy's task —— View job

Technical Review: PRD - Scheduled Agent Triggers (Cron Jobs)

As a TypeScript Staff Engineer and System Architect, I've conducted a thorough technical review of this PRD. The design is well-thought-out and follows existing patterns, but there are several architectural considerations and potential improvements to address.

🏗️ Excellent Design Decisions

✅ Proper Architecture Separation

Two-database approach correctly separates configuration (manage DB) from runtime state (runtime DB)
Exactly-once semantics using Postgres advisory locks is the right choice for reliability
Reusing ExecutionHandler maintains consistency with existing webhook trigger execution

✅ Strong Observability

Comprehensive tracing attributes following existing invocation.type patterns
Trace linkage via traceId enables debugging across the stack
OpenTelemetry spans align with current architecture

✅ Sound Multi-tenancy

Proper scoping patterns consistent with existing triggers
Advisory lock keys include tenant context to prevent cross-tenant interference

⚠️ Critical Technical Concerns

🔴 Database Schema Issues

Problem 1: Advisory Lock Collision Risk

-- PRD suggests using just job ID for advisory locks
pg_try_advisory_xact_lock(hashtext(id))

Risk: Hash collisions across tenants could cause lock contention between unrelated jobs.

Recommendation:

-- Include tenant/project context in lock key
pg_try_advisory_xact_lock(hashtext(tenant_id || ':' || project_id || ':' || id))

Problem 2: Missing Critical Indexes
The runtime schema is missing indexes for scheduler polling patterns:

Current in PRD:

index('sched_invocations_pending_idx').on(table.status, table.scheduledFor)

Missing indexes needed:

-- For claim + lock operations in single query
index('sched_invocations_claim_idx').on(table.status, table.scheduledFor, table.tenantId),
-- For efficient leader election pattern
index('sched_invocations_leader_idx').on(table.tenantId, table.projectId, table.status)

Problem 3: Idempotency Key Design

// PRD suggests: triggerId + scheduledFor
const idempotencyKey = `${triggerId}-${scheduledFor}`

Issue: This doesn't account for manual re-runs or retry attempts, potentially causing duplicates.

Recommendation:

// Include attempt context for better deduplication
const idempotencyKey = `${triggerId}-${scheduledFor.toISOString()}-${contextId || 'auto'}`

🟡 System Design Concerns

Scheduler Scalability Pattern
The leader election approach using pg_try_advisory_lock(scheduler_leader_key) is simple but has limitations:

Current Issues:

Single leader bottleneck for job creation across ALL tenants
No graceful failover mechanism described
Potential thundering herd on leader restart

Recommendation: Consider tenant-sharded leader election:

-- Per-tenant leader locks enable horizontal scaling
pg_try_advisory_lock(hashtext('scheduler_leader:' || tenant_id))

Polling Interval Trade-offs
10-second polling interval may be too aggressive:

High database load with hundreds of active schedules
10s latency is significant for time-sensitive triggers

Suggestion: Implement adaptive polling with exponential backoff when no jobs are ready.

🔧 Implementation Quality Issues

Missing Error Classifications

The PRD doesn't distinguish between error types for retry logic:

Current: Generic retry with exponential backoff
Needed: Error-specific retry strategies:

enum SchedulerError {
  TRANSIENT = 'transient',     // Retry with backoff
  AGENT_ERROR = 'agent_error', // Retry with limit
  CONFIG_ERROR = 'config',     // No retry
  RATE_LIMIT = 'rate_limit'    // Custom backoff
}

Template Variable Security

The template interpolation in messageTemplate and payload lacks input sanitization:

Risk: Template injection if user input gets into template variables
Recommendation: Implement sandboxed template execution with whitelisted variables only.

Job Claiming Race Conditions

The current claiming algorithm has a subtle race condition:

PRD Algorithm:

-- Step 1: SELECT with lock
SELECT * FROM invocations WHERE ... AND pg_try_advisory_xact_lock(...)
-- Step 2: UPDATE status
UPDATE invocations SET status = 'running'

Issue: Between SELECT and UPDATE, another process could theoretically interfere.

Fix: Use single atomic operation:

UPDATE invocations 
SET status = 'running', started_at = NOW()
WHERE id IN (
  SELECT id FROM invocations 
  WHERE status = 'pending' 
    AND scheduled_for <= NOW()
    AND pg_try_advisory_xact_lock(hashtext(tenant_id || ':' || id))
  LIMIT 100
)
RETURNING *;

📊 Surface Area & Integration Analysis

API Design Quality

The REST API endpoints follow existing patterns well, but missing key features:

Missing Endpoints:

GET /scheduled-triggers/{id}/next-runs - Preview next N execution times
POST /scheduled-triggers/{id}/test - Dry-run execution for debugging
GET /health/scheduler - Scheduler health check endpoint

SDK Builder Pattern

The scheduledTrigger() builder needs refinement:

Current PRD:

export function scheduledTrigger(config: ScheduledTriggerConfig): ScheduledTrigger

Enhanced Design:

export function scheduledTrigger(config: ScheduledTriggerConfig): ScheduledTrigger {
  // Runtime validation for better DX
  if (config.cronExpression) {
    const nextRuns = cronParser.parseExpression(config.cronExpression)
      .iterate(5).map(d => d.toDate());
    console.log(`Next 5 runs: ${nextRuns.map(d => d.toISOString()).join(', ')}`);
  }
  return new ScheduledTrigger(config);
}

CLI Integration Gaps

The PRD mentions pull/push support but doesn't address:

Validation feedback: How does CLI show cron validation errors?
Status display: Should agents status show pending scheduled executions?
Time zone handling: How does CLI handle user's local time vs UTC?

🚀 Open Questions - Architectural Decisions

1. Execution Overlap Policy

Question: When a cron job is still running at next scheduled time?

Options Analysis:

A. Queue: Simple but can cause resource exhaustion
B. Skip: Safe but users lose executions
C. Configurable: Best UX but adds complexity

Recommendation: Start with Skip (option B) for safety, add configuration later.

2. Missed Execution Handling

Question: Scheduler downtime causes missed executions?

Technical Analysis:

Catch-up mode: Complex to implement correctly, risk of execution storms
Skip mode: Simple but users might miss important scheduled work

Recommendation: Implement limited catch-up - only catch up if downtime < 1 hour, skip older executions.

3. Natural Language Parsing

Question: LLM vs regex for "every Monday at 9am" parsing?

Implementation Trade-offs:

LLM approach: Better UX, higher accuracy, but adds latency/cost
Regex approach: Fast, predictable, but limited patterns

Recommendation: Hybrid approach - regex for common patterns, LLM fallback for complex expressions.

🎯 High-Priority Recommendations

1. Database Schema Fixes (Critical)

Fix advisory lock key collision risk
Add missing indexes for scheduler queries
Improve idempotency key design

2. Error Handling Enhancement (High)

Implement error classification for smart retries
Add circuit breaker pattern for external dependencies
Sanitize template variable interpolation

3. Scalability Preparation (Medium)

Design tenant-sharded leader election
Implement adaptive polling intervals
Add scheduler health check endpoints

4. Developer Experience (Medium)

Add cron validation with next-run preview in SDK
Enhance CLI with time zone conversion helpers
Implement dry-run/test execution endpoints

✅ PRD Quality Assessment

Strengths:

✅ Comprehensive user stories with clear acceptance criteria
✅ Detailed Mermaid diagrams for system flows
✅ Proper consideration of exactly-once semantics
✅ Well-thought-out integration across all surfaces
✅ Good separation of concerns (manage vs runtime)

Areas for Improvement:

⚠️ Missing critical database index specifications
⚠️ Insufficient error classification for retry logic
⚠️ Limited scalability analysis beyond single-worker leader election
⚠️ Security considerations for template interpolation not addressed

Overall Assessment: This is a well-architected PRD that demonstrates strong system design thinking. The technical approach is sound and builds appropriately on existing patterns. With the critical database schema fixes and error handling improvements, this would be a robust foundation for scheduled triggers.

Recommendation: ✅ Approve with revisions - address the database schema issues and error classification before implementation.

Remove quotes and special characters that were causing parse errors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

amikofalvy · 2026-01-28T04:55:53Z

spec/prd-scheduled-triggers.md

+
+## Introduction
+
+Add time-based scheduling capabilities to agent triggers, enabling both recurring schedules (cron-style) and one-time delayed execution. Currently, agents can only be triggered via webhooks or direct API calls. Users need the ability to schedule agents to run automatically—daily reports, periodic data syncs, scheduled reminders, or deferred task execution.


callout on the requirement I added to do deferred task execution. We might want to defer this work item to a later date, but it might be worth considering when designing the API for other types of scheduled workflows.

amikofalvy · 2026-01-28T04:57:44Z

spec/prd-scheduled-triggers.md

+
+2. **Exactly-Once Execution**: Using Postgres advisory locks and transactional job claiming to prevent duplicate executions, with idempotency keys for downstream operations.
+
+3. **UTC-Only Scheduling**: All schedules stored and executed in UTC. Users convert their local times; no DST complexity in v1.


We may want to rethink this requirement. I worry that if we don't handle this now, we will neglect to do it later. It shouldn't be too hard to add in the initial implementation.

vercel bot deployed to Preview – agents-api January 27, 2026 21:55 View deployment

vercel bot deployed to Preview – agents-docs January 27, 2026 21:55 View deployment

vercel bot deployed to Preview – agents-manage-ui January 27, 2026 21:55 View deployment

Fix Mermaid syntax error in debugging flow diagram

cc8a5c3

Remove quotes and special characters that were causing parse errors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

vercel bot deployed to Preview – agents-api January 27, 2026 21:59 View deployment

vercel bot deployed to Preview – agents-manage-ui January 27, 2026 22:00 View deployment

vercel bot deployed to Preview – agents-docs January 27, 2026 22:00 View deployment

amikofalvy commented Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PRD: Scheduled Agent Triggers (Cron Jobs) #1612

PRD: Scheduled Agent Triggers (Cron Jobs) #1612

Uh oh!

amikofalvy commented Jan 27, 2026

Uh oh!

changeset-bot bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

vercel bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

amikofalvy Jan 28, 2026 •

edited

Loading

Uh oh!

amikofalvy Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Introduction

		Add time-based scheduling capabilities to agent triggers, enabling both recurring schedules (cron-style) and one-time delayed execution. Currently, agents can only be triggered via webhooks or direct API calls. Users need the ability to schedule agents to run automatically—daily reports, periodic data syncs, scheduled reminders, or deferred task execution.


		2. Exactly-Once Execution: Using Postgres advisory locks and transactional job claiming to prevent duplicate executions, with idempotency keys for downstream operations.

		3. UTC-Only Scheduling: All schedules stored and executed in UTC. Users convert their local times; no DST complexity in v1.

PRD: Scheduled Agent Triggers (Cron Jobs) #1612

Are you sure you want to change the base?

PRD: Scheduled Agent Triggers (Cron Jobs) #1612

Uh oh!

Conversation

amikofalvy commented Jan 27, 2026

Summary

Key Design Decisions

Architecture Diagrams Included

Surfaces Impacted

Open Questions for Review

Uh oh!

changeset-bot bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

vercel bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Technical Review: PRD - Scheduled Agent Triggers (Cron Jobs)

🏗️ Excellent Design Decisions

✅ Proper Architecture Separation

✅ Strong Observability

✅ Sound Multi-tenancy

⚠️ Critical Technical Concerns

🔴 Database Schema Issues

🟡 System Design Concerns

🔧 Implementation Quality Issues

Missing Error Classifications

Template Variable Security

Job Claiming Race Conditions

📊 Surface Area & Integration Analysis

API Design Quality

SDK Builder Pattern

CLI Integration Gaps

🚀 Open Questions - Architectural Decisions

1. Execution Overlap Policy

2. Missed Execution Handling

3. Natural Language Parsing

🎯 High-Priority Recommendations

1. Database Schema Fixes (Critical)

2. Error Handling Enhancement (High)

3. Scalability Preparation (Medium)

4. Developer Experience (Medium)

✅ PRD Quality Assessment

Uh oh!

amikofalvy Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amikofalvy Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Jan 27, 2026 •

edited

Loading

vercel bot commented Jan 27, 2026 •

edited

Loading

claude bot commented Jan 27, 2026 •

edited

Loading

amikofalvy Jan 28, 2026 •

edited

Loading