ScaleODM

Kubernetes-native auto-scaling and load balancing for OpenDroneMap.

What Is ScaleODM?

ScaleODM is a Kubernetes-native orchestration layer for OpenDroneMap, designed to automatically scale processing workloads using native Kubernetes primitives such as Jobs, Deployments, and Horizontal Pod Autoscalers.

It aims to provide the same API surface as NodeODM, while replacing both NodeODM and ClusterODM with a single, cloud-native control plane.

Note

ScaleODM has no authentication mechanisms, and should not be exposed publicly.

Instead, your frontend connects to a backend. The backend then uses PyODM or similar to connect to the internal network ScaleODM instance.

In order to federate multiple ScaleODM instances, a secure network mesh should be made with tools like Tailscale.

Rationale

ClusterODM --> NodeODM --> ODM are all fantastic tools, well tested with a big community behind them.
However, running these tools inside a Kubernetes cluster poses a few challenges:

ClusterODM Limitations

Scaling relies on provisioning or deprovisioning VMs, not container replicas.
Kubernetes-native scaling (Deployments, Jobs, KEDA) doesn't map neatly to its model.

NodeODM Limitations

Data ingestion depends on zip_url or uploading via HTTP.
S3 integration covers outputs only, not input data. Ideally we need a data 'pull' approach instead of data 'push'.
Built-in file-based queues are not distributed or Kubernetes-aware.

v1 Experiments

Our initial goal was to deploy ClusterODM and NodeODM as-is inside Kubernetes, scaling NodeODM instances dynamically via KEDA.

ScaleODM was introduced as a lightweight queueing API, backed by PostgreSQL (SKIP LOCKED), acting as a mediator for job scheduling and scaling triggers.

However, two main challenges emerged:

NodeODM's internal queueing is file-based and not easily abstracted for distributed scaling.
Data ingestion still required either HTTP uploads or zip_url packaging, adding unnecessary I/O overhead.

NodeODM wasn't really designed for ephemeral or autoscaled container environments, and that's fine.

v2 Implementation

Rethinking the architecture: instead of orchestrating NodeODM instances, it probably makes more sense to orchestrate ODM workloads inside as Kubernetes Jobs or Argo Workflows.

Key concepts:

NodeODM-compatible API: ScaleODM exposes the same REST endpoints as NodeODM, ensuring 🤞 compatibility with existing tools (e.g. PyODM).
Kubernetes Jobs: Each processing task is executed in an ephemeral container, than can be distributed by the control plane as needed.
S3-native workflow: Each job downloads inputs, performs processing, uploads outputs, and exits cleanly - no persistent volumes required. (i.e. jobs include the S3 params / credentials).
Federation: ScaleODM instances can be federated across clusters, enabling global load balancing and community resource sharing.

The decision to take this approach was not taken lightly, as we are strong supporters of contributing to existing open-source projects.

Long term, hopefully the ODM community can steward this project as an alternative processing API (with different requirements).

For more details, see the decisions section in this repo.

Roadmap

Status	Feature	Release
🔄	NodeODM-compatible API (submit, status, download)	v1
🔄	Processing pipeline using Argo workflows + ODM containers	v1
🔄	Using the same job statuses as NodeODM (QUEUED, RUNNING, FAILED, COMPLETED, CANCELED)	v1
🔄	Env var config for API / pipeline	v1
🔄	Pre-processing to determine the required resource usage for the workflow (CPU / RAM allocated)	v1
🔄	Accept both zipped and unzipped imagery via S3 dir	v1
📅	Progress monitoring via API by hooking into the ODM container logs	v2
📅	Split-merge workflow	v2
📅	Accept GCP as part of job submission	v2
📅	Federation of ScaleODM instances and task distribution	v3
📅	Webhook triggering - send a notification to external system when complete	v3
📅	Post processing of the final artifacts - capability present in NodeODM	v4
📅	Consider a load balancing service across all ScaleODM instances in DB	v4
📅	Adding extra missing things from NodeODM implementation, if required*	v4

*missing NodeODM functionality

Exposing all of the config options possible in ODM.
Multi-step project creation endpoints, with direct file upload.
Exposing all of the config options possible in ODM.

Usage

Details to come once API is stabilised.

S3 Usage

ScaleODM supports two modes for S3 access:

Static Credentials (Simple, Less Secure)

Set SCALEODM_S3_ACCESS_KEY and SCALEODM_S3_SECRET_KEY environment variables.
These credentials are passed directly to all workflow jobs.
Note: This is less secure as credentials are stored in the cluster.

STS Temporary Credentials (Recommended)

For better security, use AWS STS to generate temporary credentials per job:

Set environment variables:

SCALEODM_S3_ACCESS_KEY=<your-iam-user-access-key>
SCALEODM_S3_SECRET_KEY=<your-iam-user-secret-key>
SCALEODM_S3_STS_ROLE_ARN=arn:aws:iam::ACCOUNT_ID:role/scaleodm-workflow-role
SCALEODM_S3_STS_ENDPOINT=  # Optional: defaults to https://sts.{region}.amazonaws.com

IAM User Permissions (for the user specified in SCALEODM_S3_ACCESS_KEY):

The IAM user must have permission to assume the STS role:
```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::ACCOUNT_ID:role/scaleodm-workflow-role"
    }
  ]
}
```
Important: The Resource must match the exact role ARN specified in SCALEODM_S3_STS_ROLE_ARN. Using "Resource": "*" is less secure but allows assuming any role.

IAM Role Trust Policy (for the role specified in SCALEODM_S3_STS_ROLE_ARN):

The role must trust the IAM user:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCOUNT_ID:user/your-scaleodm-user"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

IAM Role Permissions (for the role):

The role must have permissions to read/write to your S3 buckets:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name/*",
        "arn:aws:s3:::your-bucket-name"
      ]
    }
  ]
}

How it works:

When a job is submitted, ScaleODM uses the IAM user credentials to call sts:AssumeRole on the specified role.
Temporary credentials (valid for 24 hours) are generated and injected into the workflow.
Each workflow job uses these temporary credentials to access S3.
Credentials automatically expire, reducing security risk.

Troubleshooting:

If you see errors like:

User: arn:aws:iam::ACCOUNT_ID:user/your-user is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::ACCOUNT_ID:role/your-role

Check:

The IAM user has sts:AssumeRole permission for the role ARN.
The role's trust policy allows the IAM user to assume it.
The SCALEODM_S3_STS_ROLE_ARN is set to a role ARN (not a user ARN).

Development

Binary and container image distribution is automated on new release.

Local Development Setup

For local development and testing, ScaleODM uses a Talos Kubernetes cluster created via talosctl cluster create. This provides a real Kubernetes environment for testing Argo Workflows integration.

Quick start:

# Setup Talos cluster and start all services
just dev

This will:

Create a local Talos Kubernetes cluster
Install Argo Workflows
Start PostgreSQL, MinIO, and the ScaleODM API

Manual setup:

# 1. Setup Talos cluster (one-time)
just test-cluster-init

# 2. Start compose services
just start

Testing workflow:

just test-cluster-init  # Setup cluster
just test              # Run tests
just test-cluster-destroy  # Clean up

See compose.README.md for detailed setup instructions.

Prerequisites:

talosctl installed (installation guide)
Docker running
At least 8GB free memory

Run The Tests

The test suite depends on a database and Kubernetes cluster:

# With Talos cluster already running
just test

# Or manually
docker compose run --rm api go test -timeout=2m -v ./...

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
app		app
chart		chart
decisions		decisions
docs		docs
examples		examples
testutil		testutil
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
Dockerfile		Dockerfile
Justfile		Justfile
LICENSE.md		LICENSE.md
README.md		README.md
compose.yml		compose.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ScaleODM

What Is ScaleODM?

Rationale

ClusterODM Limitations

NodeODM Limitations

v1 Experiments

v2 Implementation

Roadmap

Usage

S3 Usage

Static Credentials (Simple, Less Secure)

STS Temporary Credentials (Recommended)

Development

Local Development Setup

Run The Tests

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

hotosm/ScaleODM

Folders and files

Latest commit

History

Repository files navigation

ScaleODM

What Is ScaleODM?

Rationale

ClusterODM Limitations

NodeODM Limitations

v1 Experiments

v2 Implementation

Roadmap

Usage

S3 Usage

Static Credentials (Simple, Less Secure)

STS Temporary Credentials (Recommended)

Development

Local Development Setup

Run The Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages