Vector Institute Compute Playbook

A comprehensive starter repository for researchers at the Vector Institute to get started with high-performance computing on Bon Echo and Killarney clusters. This playbook provides everything you need to run machine learning experiments at scale, from basic cluster usage to advanced distributed training workflows.

🚀 What's Inside

This repository provides two main components:

📚 Getting Started Documentation

Cluster Introduction: Complete guide to connecting to and using Vector compute resources
Slurm Examples: Real-world examples showing how to submit jobs, run distributed training, and use cluster services
Migration Guide: Instructions for moving from legacy Bon Echo to the new Killarney cluster

🧪 ML Training Templates

Ready-to-run examples for different ML domains (LLM, VLM, MLP, RL)
Hydra + Submitit integration for configurable experiments and hyperparameter sweeps
Cluster-optimized configs for different hardware setups (A40, A100, H100, L40S)
Checkpointing & requeue support for long-running jobs

🏃‍♂️ Quick Start

1. Prerequisites

Access to Vector Institute compute clusters (Bon Echo or Killarney)
uv package manager installed

2. Clone and Setup

# Clone the repository
git clone https://github.com/VectorInstitute/vec-playbook.git
cd vec-playbook

# Install dependencies
uv sync

3. Configure Your Account

Edit templates/configs/user.yaml with your Slurm account details:

user:
  slurm:
    account: YOUR_ACCOUNT

4. Run Your First Job

# Simple MLP training on Killarney L40S
uv run python -m mlp.single.launch compute=killarney/l40s_1x requeue=off --multirun

📖 Navigation Guide

For New Users

Start here: Getting Started Documentation - Learn the basics of Vector compute
Try examples: Slurm Examples - Run simple jobs to get familiar
Use templates: Templates - Run ML training experiments

For Experienced Users

Templates: templates/ - Training workflows
Configs: templates/configs/ - Cluster and experiment configurations
Advanced: templates/README.md - Detailed usage instructions

🖥️ Supported Hardware

Bon Echo Cluster

A40 GPUs: 1x, 4x configurations
A100 GPUs: 1x, 4x configurations

Killarney Cluster

H100 GPUs: 1x, 8x configurations
L40S GPUs: 1x, 2x configurations

📚 Documentation Structure

vec-playbook/
├── getting-started/           # 📖 Learning resources
│   ├── introduction-to-vector-compute/  # Cluster basics
│   └── slurm-examples/        # 🧪 Hands-on examples
├── templates/                # 🧬 ML training templates
│   ├── src/                  # Template source code
│   └── configs/              # Cluster & experiment configs
└── README.md                 # This file

🤝 Contributing

We welcome contributions! Whether it's:

New training templates
Additional cluster configurations
Documentation improvements
Bug fixes

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
getting-started		getting-started
templates		templates
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vector Institute Compute Playbook

🚀 What's Inside

📚 Getting Started Documentation

🧪 ML Training Templates

🏃‍♂️ Quick Start

1. Prerequisites

2. Clone and Setup

3. Configure Your Account

4. Run Your First Job

📖 Navigation Guide

For New Users

For Experienced Users

🖥️ Supported Hardware

Bon Echo Cluster

Killarney Cluster

📚 Documentation Structure

🤝 Contributing

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

License

VectorInstitute/vec-playbook

Folders and files

Latest commit

History

Repository files navigation

Vector Institute Compute Playbook

🚀 What's Inside

📚 Getting Started Documentation

🧪 ML Training Templates

🏃‍♂️ Quick Start

1. Prerequisites

2. Clone and Setup

3. Configure Your Account

4. Run Your First Job

📖 Navigation Guide

For New Users

For Experienced Users

🖥️ Supported Hardware

Bon Echo Cluster

Killarney Cluster

📚 Documentation Structure

🤝 Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages