Skip to content

feat(supervisor): project-based scheduling affinity for image cache locality#2995

Merged
myftija merged 2 commits intomainfrom
project-scheduling-affinity
Feb 4, 2026
Merged

feat(supervisor): project-based scheduling affinity for image cache locality#2995
myftija merged 2 commits intomainfrom
project-scheduling-affinity

Conversation

@myftija
Copy link
Collaborator

@myftija myftija commented Feb 4, 2026

Adds optional pod affinity so pods from the same project prefer scheduling on the same node. This can help improve image cache hit rates; subsequent pods benefit from already-pulled image layers, reducing startup time.

Complements the built-in ImageLocality scheduler plugin by helping during burst scheduling scenarios. Pod affinity sees scheduled pods immediately, while ImageLocality only sees images after they're fully pulled.

Configuration:

  • KUBERNETES_PROJECT_AFFINITY_ENABLED - Enable/disable (default: false)
  • KUBERNETES_PROJECT_AFFINITY_WEIGHT - Scheduler weight 1-100 (default: 50)
  • KUBERNETES_PROJECT_AFFINITY_TOPOLOGY_KEY - Topology key (default: kubernetes.io/hostname)

Uses soft (preferred) affinity so pods always schedule even if preferred node is full.


Open with Devin

…ocality

Adds optional pod affinity so pods from the same project prefer scheduling on the same node. This can help improve image cache hit rates; subsequent pods benefit from already-pulled image layers, reducing startup time.

Complements the built-in ImageLocality scheduler plugin by helping during burst scheduling scenarios. Pod affinity sees scheduled pods immediately, while ImageLocality only sees images after they're fully pulled.

Configuration:
  - `KUBERNETES_PROJECT_AFFINITY_ENABLED` - Enable/disable (default: false)
  - `KUBERNETES_PROJECT_AFFINITY_WEIGHT` - Scheduler weight 1-100 (default: 50)
  - `KUBERNETES_PROJECT_AFFINITY_TOPOLOGY_KEY` - Topology key (default: kubernetes.io/hostname)

Uses soft (preferred) affinity so pods always schedule even if preferred node is full.
@changeset-bot
Copy link

changeset-bot bot commented Feb 4, 2026

⚠️ No Changeset found

Latest commit: 9c5be34

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

Caution

Review failed

The pull request is closed.

Walkthrough

The pull request adds three new environment variables to the supervisor environment schema to configure Kubernetes project-based pod affinity: an enabled flag, a weight (1–100), and a topology key. It also refactors the Kubernetes workload manager's affinity construction: replacing a single node-affinity getter with a composable approach that provides node affinity rules and optional project-based pod affinity, and combines them in a new getAffinity(preset, projectId) method used during pod spec creation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description provides clear context about the feature, its purpose, and configuration options, but does not follow the repository's PR template structure with required sections like Testing and Changelog. Fill out the template sections: add the issue number (Closes #...), document testing steps in the Testing section, and add a changelog entry. Update the checklist items as applicable.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: introducing project-based scheduling affinity for improving image cache locality in the Kubernetes supervisor.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch project-scheduling-affinity

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 306cbc8 and 9c5be34.

📒 Files selected for processing (1)
  • apps/supervisor/src/env.ts

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional flags.

Open in Devin Review

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@apps/supervisor/src/env.ts`:
- Around line 115-118: KUBERNETES_PROJECT_AFFINITY_TOPOLOGY_KEY currently allows
empty or whitespace-only values; update its Zod schema to enforce a non-empty,
trimmed string (e.g., use z.string().trim().min(1) or .nonempty() with .trim())
and keep the same default "kubernetes.io/hostname" so invalid inputs fail fast
at startup rather than producing invalid Kubernetes pod specs.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b72cacc and 306cbc8.

📒 Files selected for processing (2)
  • apps/supervisor/src/env.ts
  • apps/supervisor/src/workloadManager/kubernetes.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Always import tasks from @trigger.dev/sdk, never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and have a unique id property with no timeouts in the run function

Files:

  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Import from @trigger.dev/core using subpaths only, never import from root

Files:

  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • apps/supervisor/src/workloadManager/kubernetes.ts
  • apps/supervisor/src/env.ts
🧬 Code graph analysis (1)
apps/supervisor/src/env.ts (2)
apps/webapp/app/utils/boolEnv.ts (1)
  • BoolEnv (12-14)
apps/supervisor/src/envUtil.ts (1)
  • BoolEnv (15-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
🔇 Additional comments (2)
apps/supervisor/src/workloadManager/kubernetes.ts (2)

122-124: LGTM: affinity is cleanly wired into the pod spec.

Nice and minimal integration; optional affinity stays absent when undefined.


393-474: LGTM: affinity composition is well-factored.

The separation of node vs. project pod affinity keeps the logic readable and makes the optionality explicit.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@myftija myftija merged commit 8e00344 into main Feb 4, 2026
32 of 33 checks passed
@myftija myftija deleted the project-scheduling-affinity branch February 4, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants