Skip to content

Add multi-queue support for Buildkite CI #1306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ChrisRackauckas
Copy link
Member

Summary

This PR adds the ability to run benchmarks on different Buildkite queues, enabling GPU-intensive benchmarks to use GPU-enabled compute resources while maintaining CPU-only benchmarks on the standard queue.

Key Features

  • Dynamic Queue Assignment: Benchmarks can specify their preferred queue through:

    • Project.toml metadata (highest priority)
    • Central configuration file mapping
    • Default fallback to juliaecosystem
  • GPU Support: GPU benchmarks automatically get:

    • queue: "gpu" assignment
    • CUDA environment variables (JULIA_CUDA_USE_BINARYBUILDER=false, JULIA_GPU_ALLOW_DEFAULT=true)
    • Extended timeout for longer GPU jobs
  • Backward Compatibility: Existing benchmarks continue using the CPU queue without changes

Implementation

Files Added

  • .buildkite/queue_config.yml: Central queue configuration mapping benchmarks to queues
  • .buildkite/generate_pipeline.jl: Dynamic pipeline generator that creates appropriate Buildkite steps
  • .buildkite/README.md: Complete documentation with usage examples

Files Modified

  • .buildkite/test_sciml.yml: Updated to use dynamic pipeline generation
  • benchmarks/PINNOptimizers/Project.toml: Example GPU queue specification

Usage Examples

GPU benchmark configuration in Project.toml:

[buildkite]
queue = "gpu"

Generated pipeline step for GPU benchmark:

- label: ":julia: PINNOptimizers on gpu"
  agents:
    queue: "gpu"
  env:
    JULIA_CUDA_USE_BINARYBUILDER: "false"
    JULIA_GPU_ALLOW_DEFAULT: "true"

CPU benchmark continues to use:

- label ":julia: NonStiffODE on juliaecosystem"
  agents:
    queue: "juliaecosystem"

Testing

The system has been tested with:

  • GPU benchmark (PINNOptimizers) → correctly assigned to gpu queue
  • CPU benchmark (NonStiffODE) → correctly assigned to juliaecosystem queue
  • Dynamic pipeline generation works with both individual files and folder targets

Infrastructure Requirements

To fully utilize this system:

  1. GPU-enabled Buildkite agents tagged with queue: "gpu"
  2. Standard CPU agents continue with queue: "juliaecosystem"

Test plan

  • Test pipeline generation for GPU benchmarks
  • Test pipeline generation for CPU benchmarks
  • Verify backward compatibility
  • Test queue assignment priorities (Project.toml > config file > default)
  • Deploy and verify on actual Buildkite infrastructure
  • Monitor GPU queue utilization and performance

🤖 Generated with Claude Code

This adds the ability to run benchmarks on different Buildkite queues,
enabling GPU-intensive benchmarks to use GPU-enabled compute resources
while maintaining CPU-only benchmarks on the standard queue.

Key features:
- Dynamic queue assignment based on benchmark requirements
- Support for GPU-specific environment variables and timeouts
- Backward compatibility with existing CPU-only workflow
- Flexible configuration through Project.toml or central config

Files added:
- .buildkite/queue_config.yml: Central queue configuration
- .buildkite/generate_pipeline.jl: Dynamic pipeline generator
- .buildkite/README.md: Complete documentation

Files modified:
- .buildkite/test_sciml.yml: Updated to use dynamic pipeline
- benchmarks/PINNOptimizers/Project.toml: Example GPU queue specification

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ChrisRackauckas
Copy link
Member Author

This looks like it's probably just wrong 😅

@ChrisRackauckas
Copy link
Member Author

We now have an exclusive gpu queue for this: JuliaGPU/buildkite@7cbf182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants