Skip to content

Fix: fixes and prerequisites for v4 self-hosting #2150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Jun 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
4c4dbf6
remove pgadmin
nicktrn May 16, 2025
42d61fa
remove V3_ENABLED
nicktrn May 17, 2025
7b25885
v3 is always enabled
nicktrn May 17, 2025
0562485
Merge remote-tracking branch 'origin/main' into v4/self-hosting
nicktrn May 17, 2025
35aaaf6
enfore docker machine presets by default
nicktrn May 17, 2025
8f78d3d
rename autoremove env var
nicktrn May 17, 2025
4b5adbf
prefix more k8s-specific env vars
nicktrn May 17, 2025
a4e2515
same prefix for all docker settings
nicktrn May 17, 2025
0f0967d
improve profile switcher copy
nicktrn May 17, 2025
ceab383
supervisor can load token from file
nicktrn May 18, 2025
f142a5e
optional webapp worker group bootstrap
nicktrn May 18, 2025
45c1ea0
fix error message
nicktrn May 18, 2025
43df3b3
fix app origin fallback for otlp endpoint
nicktrn May 18, 2025
10ba001
use pnpm cache for webapp docker builds
nicktrn May 18, 2025
86cdea9
increase default org and env concurrency limit to 100
nicktrn May 19, 2025
141590f
optional machine preset overrides
nicktrn May 19, 2025
0ec2984
Merge remote-tracking branch 'origin/main' into v4/self-hosting
nicktrn May 19, 2025
57238b0
Merge remote-tracking branch 'origin/main' into v4/self-hosting
nicktrn May 21, 2025
8f6e8be
Merge remote-tracking branch 'origin/main' into v4/self-hosting
nicktrn May 21, 2025
54ac089
improve s3 pre-signing errors
nicktrn May 21, 2025
c69cb4b
fix DOCKER_ENFORCE_MACHINE_PRESETS bool coercion
nicktrn May 22, 2025
918542c
shard unit tests
nicktrn May 22, 2025
b803efb
Merge remote-tracking branch 'origin/main' into v4/self-hosting
nicktrn May 22, 2025
7a14cb5
fix for s3-compatible services
nicktrn May 22, 2025
0e82bc0
optional object store region
nicktrn May 22, 2025
4cf8a55
Merge remote-tracking branch 'origin/main' into v4/self-hosting
nicktrn Jun 4, 2025
1f19a8c
Merge remote-tracking branch 'origin/main' into v4/self-hosting
nicktrn Jun 4, 2025
5c5f117
Update apps/supervisor/src/workerToken.ts
nicktrn Jun 4, 2025
5b7c729
fix DEPLOY_REGISTRY_HOST example
nicktrn Jun 4, 2025
3851ffd
fix platform mock
nicktrn Jun 4, 2025
fb257fc
remove remaining v3Enabled refs
nicktrn Jun 4, 2025
49f63b7
fix error type.. bad bot
nicktrn Jun 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/smooth-planets-flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"trigger.dev": patch
---

Update profile switcher
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ COORDINATOR_SECRET=coordinator-secret # generate the actual secret with `openssl

# DEPOT_ORG_ID=<Depot org id>
# DEPOT_TOKEN=<Depot org token>
DEPLOY_REGISTRY_HOST=${APP_ORIGIN} # This is the host that the deploy CLI will use to push images to the registry
DEPLOY_REGISTRY_HOST=localhost:5000 # This is the host that the deploy CLI will use to push images to the registry
# DEV_OTEL_EXPORTER_OTLP_ENDPOINT="http://0.0.0.0:4318"
# These are needed for the object store (for handling large payloads/outputs)
# OBJECT_STORE_BASE_URL="https://{bucket}.{accountId}.r2.cloudflarestorage.com"
Expand Down
12 changes: 4 additions & 8 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,6 @@ branch are tagged into a release periodically.
pnpm run docker
```

This will also start and run a local instance of [pgAdmin](https://www.pgadmin.org/) on [localhost:5480](http://localhost:5480), preconfigured with email `admin@example.com` and pwd `admin`. Then use `postgres` as the password to the Trigger.dev server.

9. Migrate the database
```
pnpm run db:migrate
Expand Down Expand Up @@ -94,13 +92,11 @@ We use the `<root>/references/v3-catalog` subdirectory as a staging ground for t

First, make sure you are running the webapp according to the instructions above. Then:

1. In Postgres go to the "Organizations" table and on your org set the `v3Enabled` column to `true`.

2. Visit http://localhost:3030 in your browser and create a new V3 project called "v3-catalog". If you don't see an option for V3, you haven't set the `v3Enabled` flag to true.
1. Visit http://localhost:3030 in your browser and create a new V3 project called "v3-catalog".

3. In Postgres go to the "Projects" table and for the project you create change the `externalRef` to `yubjwjsfkxnylobaqvqz`.
2. In Postgres go to the "Projects" table and for the project you create change the `externalRef` to `yubjwjsfkxnylobaqvqz`.

4. Build the CLI
3. Build the CLI

```sh
# Build the CLI
Expand All @@ -109,7 +105,7 @@ pnpm run build --filter trigger.dev
pnpm i
```

5. Change into the `<root>/references/v3-catalog` directory and authorize the CLI to the local server:
4. Change into the `<root>/references/v3-catalog` directory and authorize the CLI to the local server:

```sh
cd references/v3-catalog
Expand Down
1 change: 0 additions & 1 deletion apps/supervisor/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,4 @@ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:3030/otel

# Optional settings
DEBUG=1
ENFORCE_MACHINE_PRESETS=1
TRIGGER_DEQUEUE_INTERVAL_MS=1000
56 changes: 27 additions & 29 deletions apps/supervisor/src/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ const Env = z.object({

// Required settings
TRIGGER_API_URL: z.string().url(),
TRIGGER_WORKER_TOKEN: z.string(),
TRIGGER_WORKER_TOKEN: z.string(), // accepts file:// path to read from a file
MANAGED_WORKER_SECRET: z.string(),
OTEL_EXPORTER_OTLP_ENDPOINT: z.string().url(), // set on the runners

// Workload API settings (coordinator mode) - the workload API is what the run controller connects to
TRIGGER_WORKLOAD_API_ENABLED: BoolEnv.default("true"),
Expand All @@ -29,26 +30,6 @@ const Env = z.object({
RUNNER_SNAPSHOT_POLL_INTERVAL_SECONDS: z.coerce.number().optional(),
RUNNER_ADDITIONAL_ENV_VARS: AdditionalEnvVars, // optional (csv)
RUNNER_PRETTY_LOGS: BoolEnv.default(false),
RUNNER_DOCKER_AUTOREMOVE: BoolEnv.default(true),
/**
* Network mode to use for all runners. Supported standard values are: `bridge`, `host`, `none`, and `container:<name|id>`.
* Any other value is taken as a custom network's name to which all runners should connect to.
*
* Accepts a list of comma-separated values to attach to multiple networks. Additional networks are interpreted as network names and will be attached after container creation.
*
* **WARNING**: Specifying multiple networks will slightly increase startup times.
*
* @default "host"
*/
RUNNER_DOCKER_NETWORKS: z.string().default("host"),

// Docker settings
DOCKER_API_VERSION: z.string().default("v1.41"),
DOCKER_PLATFORM: z.string().optional(), // e.g. linux/amd64, linux/arm64
DOCKER_STRIP_IMAGE_DIGEST: BoolEnv.default(true),
DOCKER_REGISTRY_USERNAME: z.string().optional(),
DOCKER_REGISTRY_PASSWORD: z.string().optional(),
DOCKER_REGISTRY_URL: z.string().optional(), // e.g. https://index.docker.io/v1

// Dequeue settings (provider mode)
TRIGGER_DEQUEUE_ENABLED: BoolEnv.default("true"),
Expand All @@ -62,22 +43,39 @@ const Env = z.object({
TRIGGER_CHECKPOINT_URL: z.string().optional(),
TRIGGER_METADATA_URL: z.string().optional(),

// Used by the workload manager, e.g docker/k8s
OTEL_EXPORTER_OTLP_ENDPOINT: z.string().url(),
ENFORCE_MACHINE_PRESETS: z.coerce.boolean().default(false),
KUBERNETES_IMAGE_PULL_SECRETS: z.string().optional(), // csv

// Used by the resource monitor
RESOURCE_MONITOR_ENABLED: BoolEnv.default(false),
RESOURCE_MONITOR_OVERRIDE_CPU_TOTAL: z.coerce.number().optional(),
RESOURCE_MONITOR_OVERRIDE_MEMORY_TOTAL_GB: z.coerce.number().optional(),

// Kubernetes specific settings
// Docker settings
DOCKER_API_VERSION: z.string().default("v1.41"),
DOCKER_PLATFORM: z.string().optional(), // e.g. linux/amd64, linux/arm64
DOCKER_STRIP_IMAGE_DIGEST: BoolEnv.default(true),
DOCKER_REGISTRY_USERNAME: z.string().optional(),
DOCKER_REGISTRY_PASSWORD: z.string().optional(),
DOCKER_REGISTRY_URL: z.string().optional(), // e.g. https://index.docker.io/v1
DOCKER_ENFORCE_MACHINE_PRESETS: BoolEnv.default(true),
DOCKER_AUTOREMOVE_EXITED_CONTAINERS: BoolEnv.default(true),
/**
* Network mode to use for all runners. Supported standard values are: `bridge`, `host`, `none`, and `container:<name|id>`.
* Any other value is taken as a custom network's name to which all runners should connect to.
*
* Accepts a list of comma-separated values to attach to multiple networks. Additional networks are interpreted as network names and will be attached after container creation.
*
* **WARNING**: Specifying multiple networks will slightly increase startup times.
*
* @default "host"
*/
DOCKER_RUNNER_NETWORKS: z.string().default("host"),

// Kubernetes settings
KUBERNETES_FORCE_ENABLED: BoolEnv.default(false),
KUBERNETES_NAMESPACE: z.string().default("default"),
KUBERNETES_WORKER_NODETYPE_LABEL: z.string().default("v4-worker"),
EPHEMERAL_STORAGE_SIZE_LIMIT: z.string().default("10Gi"),
EPHEMERAL_STORAGE_SIZE_REQUEST: z.string().default("2Gi"),
KUBERNETES_IMAGE_PULL_SECRETS: z.string().optional(), // csv
KUBERNETES_EPHEMERAL_STORAGE_SIZE_LIMIT: z.string().default("10Gi"),
KUBERNETES_EPHEMERAL_STORAGE_SIZE_REQUEST: z.string().default("2Gi"),

// Metrics
METRICS_ENABLED: BoolEnv.default(true),
Expand Down
5 changes: 3 additions & 2 deletions apps/supervisor/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import { collectDefaultMetrics } from "prom-client";
import { register } from "./metrics.js";
import { PodCleaner } from "./services/podCleaner.js";
import { FailedPodHandler } from "./services/failedPodHandler.js";
import { getWorkerToken } from "./workerToken.js";

if (env.METRICS_COLLECT_DEFAULTS) {
collectDefaultMetrics({ register });
Expand Down Expand Up @@ -67,7 +68,7 @@ class ManagedSupervisor {
heartbeatIntervalSeconds: env.RUNNER_HEARTBEAT_INTERVAL_SECONDS,
snapshotPollIntervalSeconds: env.RUNNER_SNAPSHOT_POLL_INTERVAL_SECONDS,
additionalEnvVars: env.RUNNER_ADDITIONAL_ENV_VARS,
dockerAutoremove: env.RUNNER_DOCKER_AUTOREMOVE,
dockerAutoremove: env.DOCKER_AUTOREMOVE_EXITED_CONTAINERS,
} satisfies WorkloadManagerOptions;

this.resourceMonitor = env.RESOURCE_MONITOR_ENABLED
Expand Down Expand Up @@ -119,7 +120,7 @@ class ManagedSupervisor {
}

this.workerSession = new SupervisorSession({
workerToken: env.TRIGGER_WORKER_TOKEN,
workerToken: getWorkerToken(),
apiUrl: env.TRIGGER_API_URL,
instanceName: env.TRIGGER_WORKER_INSTANCE_NAME,
managedWorkerSecret: env.MANAGED_WORKER_SECRET,
Expand Down
29 changes: 29 additions & 0 deletions apps/supervisor/src/workerToken.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import { readFileSync } from "fs";
import { env } from "./env.js";

export function getWorkerToken() {
if (!env.TRIGGER_WORKER_TOKEN.startsWith("file://")) {
return env.TRIGGER_WORKER_TOKEN;
}

const tokenPath = env.TRIGGER_WORKER_TOKEN.replace("file://", "");

console.debug(
JSON.stringify({
message: "🔑 Reading worker token from file",
tokenPath,
})
);

try {
const token = readFileSync(tokenPath, "utf8").trim();
return token;
} catch (error) {
console.error(`Failed to read worker token from file: ${tokenPath}`, error);
throw new Error(
`Unable to read worker token from file: ${
error instanceof Error ? error.message : "Unknown error"
}`
);
}
}
11 changes: 5 additions & 6 deletions apps/supervisor/src/workloadManager/docker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ export class DockerWorkloadManager implements WorkloadManager {
});
}

this.runnerNetworks = env.RUNNER_DOCKER_NETWORKS.split(",");
this.runnerNetworks = env.DOCKER_RUNNER_NETWORKS.split(",");

this.platformOverride = env.DOCKER_PLATFORM;
if (this.platformOverride) {
Expand Down Expand Up @@ -61,6 +61,7 @@ export class DockerWorkloadManager implements WorkloadManager {

// Build environment variables
const envVars: string[] = [
`OTEL_EXPORTER_OTLP_ENDPOINT=${env.OTEL_EXPORTER_OTLP_ENDPOINT}`,
`TRIGGER_DEQUEUED_AT_MS=${opts.dequeuedAt.getTime()}`,
`TRIGGER_POD_SCHEDULED_AT_MS=${Date.now()}`,
`TRIGGER_ENV_ID=${opts.envId}`,
Expand All @@ -70,8 +71,9 @@ export class DockerWorkloadManager implements WorkloadManager {
`TRIGGER_SUPERVISOR_API_PORT=${this.opts.workloadApiPort}`,
`TRIGGER_SUPERVISOR_API_DOMAIN=${this.opts.workloadApiDomain ?? getDockerHostDomain()}`,
`TRIGGER_WORKER_INSTANCE_NAME=${env.TRIGGER_WORKER_INSTANCE_NAME}`,
`OTEL_EXPORTER_OTLP_ENDPOINT=${env.OTEL_EXPORTER_OTLP_ENDPOINT}`,
`TRIGGER_RUNNER_ID=${runnerId}`,
`TRIGGER_MACHINE_CPU=${opts.machine.cpu}`,
`TRIGGER_MACHINE_MEMORY=${opts.machine.memory}`,
`PRETTY_LOGS=${env.RUNNER_PRETTY_LOGS}`,
];

Expand Down Expand Up @@ -110,10 +112,7 @@ export class DockerWorkloadManager implements WorkloadManager {
// - If there are multiple networks to attach, this will ensure the runner won't also be connected to the bridge network
hostConfig.NetworkMode = firstNetwork;

if (env.ENFORCE_MACHINE_PRESETS) {
envVars.push(`TRIGGER_MACHINE_CPU=${opts.machine.cpu}`);
envVars.push(`TRIGGER_MACHINE_MEMORY=${opts.machine.memory}`);

if (env.DOCKER_ENFORCE_MACHINE_PRESETS) {
hostConfig.NanoCpus = opts.machine.cpu * 1e9;
hostConfig.Memory = opts.machine.memory * 1024 * 1024 * 1024;
}
Expand Down
4 changes: 2 additions & 2 deletions apps/supervisor/src/workloadManager/kubernetes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -236,13 +236,13 @@ export class KubernetesWorkloadManager implements WorkloadManager {

get #defaultResourceRequests(): ResourceQuantities {
return {
"ephemeral-storage": env.EPHEMERAL_STORAGE_SIZE_REQUEST,
"ephemeral-storage": env.KUBERNETES_EPHEMERAL_STORAGE_SIZE_REQUEST,
};
}

get #defaultResourceLimits(): ResourceQuantities {
return {
"ephemeral-storage": env.EPHEMERAL_STORAGE_SIZE_LIMIT,
"ephemeral-storage": env.KUBERNETES_EPHEMERAL_STORAGE_SIZE_LIMIT,
};
}

Expand Down
75 changes: 75 additions & 0 deletions apps/webapp/app/bootstrap.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
import { mkdir, writeFile } from "fs/promises";
import { prisma } from "./db.server";
import { env } from "./env.server";
import { WorkerGroupService } from "./v3/services/worker/workerGroupService.server";
import { dirname } from "path";
import { tryCatch } from "@trigger.dev/core";

export async function bootstrap() {
if (env.TRIGGER_BOOTSTRAP_ENABLED !== "1") {
return;
}

if (env.TRIGGER_BOOTSTRAP_WORKER_GROUP_NAME) {
const [error] = await tryCatch(createWorkerGroup());
if (error) {
console.error("Failed to create worker group", { error });
}
}
}

async function createWorkerGroup() {
const workerGroupName = env.TRIGGER_BOOTSTRAP_WORKER_GROUP_NAME;
const tokenPath = env.TRIGGER_BOOTSTRAP_WORKER_TOKEN_PATH;

const existingWorkerGroup = await prisma.workerInstanceGroup.findFirst({
where: {
name: workerGroupName,
},
});

if (existingWorkerGroup) {
console.warn(`[bootstrap] Worker group ${workerGroupName} already exists`);
return;
}

const service = new WorkerGroupService();
const { token, workerGroup } = await service.createWorkerGroup({
name: workerGroupName,
});

console.log(`
==========================
Trigger.dev Bootstrap - Worker Token

WARNING: This will only be shown once. Save it now!

Worker group:
${workerGroup.name}

Token:
${token.plaintext}

If using docker compose, set:
TRIGGER_WORKER_TOKEN=${token.plaintext}

${
tokenPath
? `Or, if using a file:
TRIGGER_WORKER_TOKEN=file://${tokenPath}`
: ""
}

==========================
`);

if (tokenPath) {
const dir = dirname(tokenPath);
await mkdir(dir, { recursive: true });
await writeFile(tokenPath, token.plaintext, {
mode: 0o600,
});

console.log(`[bootstrap] Worker token saved to ${tokenPath}`);
}
}
5 changes: 5 additions & 0 deletions apps/webapp/app/entry.server.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import {
OperatingSystemPlatform,
} from "./components/primitives/OperatingSystemProvider";
import { singleton } from "./utils/singleton";
import { bootstrap } from "./bootstrap";

const ABORT_DELAY = 30000;

Expand Down Expand Up @@ -177,6 +178,10 @@ Worker.init().catch((error) => {
logError(error);
});

bootstrap().catch((error) => {
logError(error);
});

function logError(error: unknown, request?: Request) {
console.error(error);

Expand Down
15 changes: 12 additions & 3 deletions apps/webapp/app/env.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -214,8 +214,8 @@ const EnvironmentSchema = z.object({
PUBSUB_REDIS_TLS_DISABLED: z.string().default(process.env.REDIS_TLS_DISABLED ?? "false"),
PUBSUB_REDIS_CLUSTER_MODE_ENABLED: z.string().default("0"),

DEFAULT_ENV_EXECUTION_CONCURRENCY_LIMIT: z.coerce.number().int().default(10),
DEFAULT_ORG_EXECUTION_CONCURRENCY_LIMIT: z.coerce.number().int().default(10),
DEFAULT_ENV_EXECUTION_CONCURRENCY_LIMIT: z.coerce.number().int().default(100),
DEFAULT_ORG_EXECUTION_CONCURRENCY_LIMIT: z.coerce.number().int().default(100),
DEFAULT_DEV_ENV_EXECUTION_ATTEMPTS: z.coerce.number().int().positive().default(1),

TUNNEL_HOST: z.string().optional(),
Expand Down Expand Up @@ -260,7 +260,6 @@ const EnvironmentSchema = z.object({
INGEST_EVENT_RATE_LIMIT_MAX: z.coerce.number().int().optional(),

//v3
V3_ENABLED: z.string().default("false"),
PROVIDER_SECRET: z.string().default("provider-secret"),
COORDINATOR_SECRET: z.string().default("coordinator-secret"),
DEPOT_TOKEN: z.string().optional(),
Expand All @@ -278,6 +277,8 @@ const EnvironmentSchema = z.object({
OBJECT_STORE_BASE_URL: z.string().optional(),
OBJECT_STORE_ACCESS_KEY_ID: z.string().optional(),
OBJECT_STORE_SECRET_ACCESS_KEY: z.string().optional(),
OBJECT_STORE_REGION: z.string().optional(),
OBJECT_STORE_SERVICE: z.string().default("s3"),
EVENTS_BATCH_SIZE: z.coerce.number().int().default(100),
EVENTS_BATCH_INTERVAL: z.coerce.number().int().default(1000),
EVENTS_DEFAULT_LOG_RETENTION: z.coerce.number().int().default(7),
Expand Down Expand Up @@ -778,6 +779,14 @@ const EnvironmentSchema = z.object({
RUN_REPLICATION_KEEP_ALIVE_ENABLED: z.string().default("1"),
RUN_REPLICATION_KEEP_ALIVE_IDLE_SOCKET_TTL_MS: z.coerce.number().int().optional(),
RUN_REPLICATION_MAX_OPEN_CONNECTIONS: z.coerce.number().int().default(10),

// Bootstrap
TRIGGER_BOOTSTRAP_ENABLED: z.string().default("0"),
TRIGGER_BOOTSTRAP_WORKER_GROUP_NAME: z.string().optional(),
TRIGGER_BOOTSTRAP_WORKER_TOKEN_PATH: z.string().optional(),

// Machine presets
MACHINE_PRESETS_OVERRIDE_PATH: z.string().optional(),
});

export type Environment = z.infer<typeof EnvironmentSchema>;
Expand Down
Loading