Kubernetes pipeline: Sentinel Zarr β Cloud-Optimized GeoZarr + STAC Registration
Automated pipeline for converting Sentinel-1/2 Zarr datasets to cloud-optimized GeoZarr format with STAC catalog integration and interactive visualization.
Transforms Sentinel-1/2 satellite data into web-ready visualizations:
Input: STAC item URL β Output: Interactive web map (~15-20 min)
Pipeline: Convert β Register
Supported Missions:
- Sentinel-2 L2A (Multi-spectral optical)
- Sentinel-1 GRD (SAR backscatter)
The data pipeline is deployed in two Kubernetes namespaces:
devseed-staging- Testing and validation environmentdevseed- Production data pipeline
This documentation uses devseed-staging in examples. For production, replace with devseed.
- Kubernetes cluster with platform-deploy (Argo Workflows, STAC API, TiTiler)
- Python 3.13+ with
uv GDALinstalled (on MacOS:brew install gdal)kubectlinstalled
Download kubeconfig from OVH Manager β Kubernetes (Access and Security tab).
mv ~/Downloads/kubeconfig.yml .work/kubeconfig
export KUBECONFIG=$(pwd)/.work/kubeconfig
kubectl get nodes # Verify: should list several nodeskubectl get wf -n devseed-stagingMake sure you have an HARBOR_USERNAME and HARBOR_PASSWORD for OVH container registry added to the .env file.
See operator-tools/README.md for webhook port forwarding setup.
Make sure all dependencies are installed by running
make setup- Authenticate with Harbor registry:
source .env
echo $HARBOR_PASSWORD | docker login w9mllyot.c1.de1.container-registry.ovh.net -u $HARBOR_USERNAME --password-stdin- Build the new version of the code:
On macOS, the linux architecture needs to be specified when building the image with the flag --platform linux/amd64 :
docker build -f docker/Dockerfile --network host -t w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:v1-staging --platform linux/amd64 .on linux:
docker build -f docker/Dockerfile --network host -t w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:v1-staging .- Push to container registry:
docker push w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:v1-staging- Once the new image is pushed, run the example Notebook and verify that workflows are running in Argo Workflows
Use the operator tools to submit STAC items via HTTP webhook. See operator-tools/README.md for:
- Interactive notebook for batch submissions
- Python script for single item testing
- Port forwarding setup
- Common actions and target collections
Direct workflow submission:
kubectl create -n devseed-staging -f - <<'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: geozarr-
spec:
workflowTemplateRef:
name: geozarr-pipeline
arguments:
parameters:
- name: source_url
value: "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2A_MSIL2A_20251022T094121_N0511_R036_T34TDT_20251022T114817"
- name: register_collection
value: "sentinel-2-l2a-dp-test"
EOF
kubectl get wf -n devseed-staging --watchMonitor: Argo Workflows UI
View Results:
- STAC Browser - Browse catalog
- TiTiler Viewer - View maps
π‘ Tip: Login to EOxHub workspace for seamless authentication.
Flow: STAC item URL β Extract zarr β Convert to GeoZarr β Upload S3 β Register STAC item β Optimize Storage β Add visualization links
Processing Steps:
V0 Pipeline (2 steps):
- Convert - Fetch STAC item, extract zarr URL, convert to cloud-optimized GeoZarr, upload to S3
- Register - Create STAC item with asset hrefs, add projection metadata and TiTiler links, register to catalog
V1 Pipeline (3 steps):
- Convert - S2-optimized conversion with enhanced performance
- Register - Enhanced registration with alternate extension and consolidated assets
- Change Storage Tier - Optimize storage costs by moving data to appropriate S3 storage class (default:
EXPRESS_ONEZONE)
Runtime: ~15-20 minutes per item
Stack:
- Processing: eopf-geozarr, Dask, Python 3.13
- Storage: S3 (OVH)
- Catalog: pgSTAC, TiTiler
Infrastructure & Workflow Details: For complete workflow architecture, event flow, and deployment configuration, see platform-deploy data-pipeline README
# Sentinel-2
source_url: "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2A_MSIL2A_..."
# Sentinel-1
source_url: "https://stac.core.eopf.eodc.eu/collections/sentinel-1-l1-grd/items/S1A_IW_GRDH_..."source_url: "https://objectstore.eodc.eu/.../product.zarr" # Direct zarr URLs not supportedWhy? Pipeline extracts zarr URL from STAC item assets automatically.
Find valid URLs:
kubectl get wf -n devseed-staging --sort-by=.metadata.creationTimestamp \
-o jsonpath='{range .items[?(@.status.phase=="Succeeded")]}{.spec.arguments.parameters[?(@.name=="source_url")].value}{"\n"}{end}' \
| tail -n 5scripts/
βββ convert_v0.py # Generic Zarr β GeoZarr converter (V0 pipeline)
βββ convert_v1_s2.py # S2-optimized GeoZarr converter (V1 pipeline)
βββ register_v0.py # Basic STAC registration (V0 pipeline)
βββ register_v1.py # Enhanced STAC registration (V1 pipeline)
βββ change_storage_tier.py # S3 storage tier optimization (V1 pipeline step 3)
βββ test_complete_workflow.py # Workflow testing script
βββ test_gateway_format.py # Gateway format testing
βββ README_storage_tier.md # Storage tier management documentation
operator-tools/
βββ manage_collections.py # STAC collection management (create/clean/update)
βββ submit_test_workflow_wh.py # HTTP webhook submission script
βββ submit_stac_items_notebook.ipynb # Batch submission notebook
βββ README.md # Operator tools documentation
βββ README_collections.md # Collection management guide
docker/Dockerfile # Container image
tests/ # Unit and integration tests
Deployment Configuration: Kubernetes manifests and infrastructure are maintained in platform-deploy
# Watch workflows
kubectl get wf -n devseed-staging --watch
# View workflow logs
kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<name> --tail=100
# Running workflows only
kubectl get wf -n devseed-staging --field-selector status.phase=RunningWeb UI: Argo Workflows
| Problem | Solution |
|---|---|
| "No group found in store" | Using direct zarr URL instead of STAC item URL |
| "Webhook not responding" | See operator-tools troubleshooting |
| Workflow not starting | Check webhook submission returned success, verify port-forward |
| S3 access denied | Contact infrastructure team to verify S3 credentials |
| Workflow stuck/failed | Check workflow logs: kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<name> |
For infrastructure issues, see platform-deploy troubleshooting: staging | production
- data-model -
eopf-geozarrconversion library - platform-deploy - Infrastructure deployment and configuration
- Operator Tools: operator-tools/README.md - Workflow submission and collection management
- Storage Management: scripts/README_storage_tier.md - S3 storage tier optimization
- Tests:
tests/- pytest unit and integration tests - Deployment: platform-deploy/workspaces/devseed-staging/data-pipeline
MIT