- UUID: b3703368-7e7e-4e8e-9e0e-6d0f0d5e8e8e
- Name: stac:
- Schema URL: https://raw.githubusercontent.com/zarr-conventions/stac/refs/tags/v1/schema.json
- Spec URL: https://github.com/zarr-conventions/stac/blob/v1/README.md
- Scope: Group
- Extension Maturity Classification: Proposal
- Owner: @emmanuelmathot
This convention defines a standard way to embed complete STAC (SpatioTemporal Asset Catalog) objects (Items or Collections) directly in Zarr group metadata. This enables self-describing Zarr stores where the spatial, temporal, and asset metadata travels with the data itself.
- Overview
- Motivation
- Convention Attributes
- Relative URL Resolution
- Examples
- Validation
- Known Implementations
The STAC Zarr Convention allows Zarr groups to contain complete STAC Item or Collection objects in their store. This creates self-describing Zarr stores that can be easily discovered and understood using standard STAC tools, including a source of truth for spatial, temporal, and asset metadata.
The convention supports two complementary use cases:
-
Metadata Sidecar: Embed STAC metadata alongside data arrays as the authoritative source of truth for individual datasets (
attributeandkeyencodings). -
Catalog Storage: Store entire STAC catalogs as multidimensional sparse arrays indexed by space and time (
arrayencoding).
Zarr conventions offers an ideal place to embed STAC metadata alongside array data. By embedding STAC objects directly in Zarr metadata, we enable two patterns:
For individual datasets and data archives, embedding STAC metadata creates:
- Self-Describing Data: The Zarr store contains all necessary metadata for discovery and description
- Simplified Distribution: A single Zarr store contains both data and metadata
- Offline Capability: No external catalog service needed to understand the data
- STAC Compliance: Full compatibility with STAC tools and validation
- Relative References: Asset paths are relative to the embedding group, maintaining portability
For large-scale catalogs and federated discovery systems, storing STAC catalogs as arrays enables:
- Scalable Storage: Handle large collections of STAC items through sparse multidimensional arrays
- Spatiotemporal Indexing: Native space-time dimensions for efficient querying
- Unified Architecture: Single system for both catalog storage and data access
- Analytical Integration: Direct compatibility with array-based analysis workflows
- Performance: Leverage Zarr's optimized multidimensional slicing capabilities
This convention defines attributes that appear at the group level of the Zarr hierarchy. The convention uses the key-prefixed pattern to avoid attribute name collisions with other conventions.
Convention metadata name: stac:
| Field Name | Type | Description |
|---|---|---|
stac:encoding |
string | REQUIRED. Encoding type for STAC objects. |
stac:item |
STAC Item | A STAC Item JSON object or a reference to one. |
stac:collection |
STAC Collection | A STAC Collection JSON object or a reference to one. |
Specifies how the STAC object is encoded in the Zarr store. Valid values are:
attribute: The STAC object is embedded directly as a JSON object in the Zarr group attributes understac:itemorstac:collection.key: The STAC object is stored as a separate JSON value within the Zarr store, referenced by a key in the Zarr group understac:itemorstac:collection.array: [UNDER DEVELOPMENT] The STAC object is stored as a data array within the Zarr store, referenced by a relative path to the array node in the Zarr group understac:itemorstac:collection. Section Encoding as Data Array provides more details.
The convention is identified in the zarr_conventions array with the following metadata:
{
"zarr_conventions": [
{
"name": "stac:",
"spec_url": "https://github.com/zarr-conventions/stac/blob/v1/README.md",
"schema_url": "https://raw.githubusercontent.com/zarr-conventions/stac/refs/tags/v1/schema.json",
"uuid": "b3703368-7e7e-4e8e-9e0e-6d0f0d5e8e8e"
}
]
}At minimum, one of spec_url, schema_url, or uuid must be present to identify the convention.
All asset href values in embedded STAC objects MUST be relative to the Zarr group containing the STAC metadata. This ensures:
- Portability: The Zarr store can be moved without breaking references
- Scope: STAC objects can only reference assets within their hierarchy
- Simplicity: Path resolution is straightforward and predictable
- Asset
hrefpaths are resolved relative to the group containing thestac:itemorstac:collectionattribute - Paths use forward slashes (
/) as separators, following POSIX conventions - Paths should not use
..to reference parent groups (STAC objects should only describe their own hierarchy)
If STAC metadata is embedded at the root group (/):
{
"assets": {
"reflectance": {
"href": "measurements/reflectance" // → /measurements/reflectance
},
"quality": {
"href": "quality/flags" // → /quality/flags
}
}
}If STAC metadata is embedded in a subgroup (/products/s2/):
{
"assets": {
"data": {
"href": "data/b01" // → /products/s2/data/b01
}
}
}The STAC store link relationship MUST be omitted from embedded STAC objects. Since the STAC object is embedded within the Zarr store itself, a store link would be self-referential and redundant.
Other link relationships (e.g., collection, parent, self, license) may be included as needed, typically pointing to external resources.
When using stac:encoding value of array, the STAC objects are stored in data array(s) within the Zarr store.
This encoding allows to store a large set of STAC objects efficiently like an entire collection of items.
There is no real value in storing a single STAC object as a data array, but it is supported for completeness.
With this encoding, the stac:item or stac:collection attribute contains a relative path to the array node.
This section is under development and opens for community feedback. The array encoding leverages Zarr's multidimensional array capabilities to store STAC metadata as sparse arrays with labeled space-time dimensions. This approach aligns naturally with core STAC metadata, and provides several key benefits:
- Scalability: Supports millions of STAC items through chunked storage and spatial indexing
- Natural Indexing: Space-time dimensions provide native spatial and temporal query capabilities
- Consistency: Single source of truth for both data and metadata within the same Zarr store
- Performance: Efficient multidimensional slicing for spatial and temporal queries
STAC metadata is organized as a sparse multidimensional array with the following dimensions:
- Time dimension: Indexed by STAC Item
datetime(or time range for multi-temporal items) - Spatial dimensions: Indexed by spatial coordinates in any CRS (lat/lon, UTM, etc.)
- Cell content: Each non-empty cell contains the complete STAC Item metadata as structured data
Temporal Coordinate
- Primary temporal index using STAC Item
datetime - Can use any time resolution (milliseconds to years) depending on data density
- Supports labeled dimensions for non-uniform time intervals
Spatial Coordinates
- Flexible spatial reference system (any CRS supported)
- Could use geographic coordinates (lat/lon) for global collections
- Could use projected coordinates (UTM, etc.) for regional collections
- Could use grid references (MGRS, tile indices) for regular grids
- Supports labeled dimensions for irregular spatial sampling
For a Sentinel-2 collection organized by MGRS tiles:
/stac_items/
├── zarr.json # Array metadata defining dimensions and coordinates
├── datetime/ # Temporal coordinate array (1D)
├── mgrs_tile/ # Spatial coordinate array (1D)
├── geometry/ # Geometry data per item (2D: time × space)
├── assets/ # Asset references per item (2D: time × space)
├── properties/ # Flattened properties per item (2D: time × space)
└── links/ # Flattened links per item (2D: time × space)
Coordinate Arrays:
datetime:["2024-01-01T10:30:00Z", "2024-01-02T10:30:00Z", ...]mgrs_tile:["32TQQ", "32TQR", "32TQL", ...]
Data Arrays:
metadata: 2D sparse array wheremetadata[t, s]contains the STAC Item at timetand locations- Most cells are empty (sparse), but where data exists, it contains complete STAC metadata
- Minimal STAC Item - A minimal example showing the required fields
- STAC Collection - Example of embedding a STAC Collection
- Sentinel-2 Scene - Sentinel-2 L2A data with multiple assets, bands, and extensions
The convention includes a JSON Schema that validates:
- Convention Structure: Ensures proper
zarr_conventionsmetadata - Encoding Field: Validates the
stac:encodingvalue - Mutual Exclusivity: Ensures only one of
stac:itemorstac:collectionis present - STAC Compliance: References official STAC schemas for Item and Collection validation
You can validate examples using the included validation script:
npm install
npm testOr validate a specific file:
node validate.js examples/minimal_item_example.jsonSince embedded objects are complete STAC Items or Collections, they can be validated using standard STAC validation tools:
# Extract the STAC object from Zarr metadata
jq '.attributes["stac:item"]' examples/minimal_item_example.json > item.json
# Validate with stac-validator (Python)
stac-validator item.jsonFollow STAC Zarr Best Practices for:
- Asset hierarchy and organization
- Band representation patterns
- Multi-resolution data (multiscales)
- Variable and dimension metadata
This section helps potential implementers assess the convention's maturity and adoption.
If you implement or use this convention, please add your implementation by submitting a pull request.
If your dataset uses this convention, please add it here by submitting a pull request.
Related specifications: