Skip to content

zarr-conventions/stac

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STAC Zarr Convention

This convention defines a standard way to embed complete STAC (SpatioTemporal Asset Catalog) objects (Items or Collections) directly in Zarr group metadata. This enables self-describing Zarr stores where the spatial, temporal, and asset metadata travels with the data itself.

Table of Contents

Overview

The STAC Zarr Convention allows Zarr groups to contain complete STAC Item or Collection objects in their store. This creates self-describing Zarr stores that can be easily discovered and understood using standard STAC tools, including a source of truth for spatial, temporal, and asset metadata.

The convention supports two complementary use cases:

  1. Metadata Sidecar: Embed STAC metadata alongside data arrays as the authoritative source of truth for individual datasets (attribute and key encodings).

  2. Catalog Storage: Store entire STAC catalogs as multidimensional sparse arrays indexed by space and time (array encoding).

Motivation

Zarr conventions offers an ideal place to embed STAC metadata alongside array data. By embedding STAC objects directly in Zarr metadata, we enable two patterns:

Metadata Sidecar

For individual datasets and data archives, embedding STAC metadata creates:

  1. Self-Describing Data: The Zarr store contains all necessary metadata for discovery and description
  2. Simplified Distribution: A single Zarr store contains both data and metadata
  3. Offline Capability: No external catalog service needed to understand the data
  4. STAC Compliance: Full compatibility with STAC tools and validation
  5. Relative References: Asset paths are relative to the embedding group, maintaining portability

Catalog Storage

For large-scale catalogs and federated discovery systems, storing STAC catalogs as arrays enables:

  1. Scalable Storage: Handle large collections of STAC items through sparse multidimensional arrays
  2. Spatiotemporal Indexing: Native space-time dimensions for efficient querying
  3. Unified Architecture: Single system for both catalog storage and data access
  4. Analytical Integration: Direct compatibility with array-based analysis workflows
  5. Performance: Leverage Zarr's optimized multidimensional slicing capabilities

Convention Attributes

This convention defines attributes that appear at the group level of the Zarr hierarchy. The convention uses the key-prefixed pattern to avoid attribute name collisions with other conventions.

Convention metadata name: stac:

Fields

Field Name Type Description
stac:encoding string REQUIRED. Encoding type for STAC objects.
stac:item STAC Item A STAC Item JSON object or a reference to one.
stac:collection STAC Collection A STAC Collection JSON object or a reference to one.

Field Details

stac:encoding

Specifies how the STAC object is encoded in the Zarr store. Valid values are:

  • attribute: The STAC object is embedded directly as a JSON object in the Zarr group attributes under stac:item or stac:collection.
  • key : The STAC object is stored as a separate JSON value within the Zarr store, referenced by a key in the Zarr group under stac:item or stac:collection.
  • array: [UNDER DEVELOPMENT] The STAC object is stored as a data array within the Zarr store, referenced by a relative path to the array node in the Zarr group under stac:item or stac:collection. Section Encoding as Data Array provides more details.

Convention Metadata

The convention is identified in the zarr_conventions array with the following metadata:

{
  "zarr_conventions": [
    {
      "name": "stac:",
      "spec_url": "https://github.com/zarr-conventions/stac/blob/v1/README.md",
      "schema_url": "https://raw.githubusercontent.com/zarr-conventions/stac/refs/tags/v1/schema.json",
      "uuid": "b3703368-7e7e-4e8e-9e0e-6d0f0d5e8e8e"
    }
  ]
}

At minimum, one of spec_url, schema_url, or uuid must be present to identify the convention.

STAC URL Resolution

Asset Href Resolution

All asset href values in embedded STAC objects MUST be relative to the Zarr group containing the STAC metadata. This ensures:

  • Portability: The Zarr store can be moved without breaking references
  • Scope: STAC objects can only reference assets within their hierarchy
  • Simplicity: Path resolution is straightforward and predictable

Resolution Rules

  1. Asset href paths are resolved relative to the group containing the stac:item or stac:collection attribute
  2. Paths use forward slashes (/) as separators, following POSIX conventions
  3. Paths should not use .. to reference parent groups (STAC objects should only describe their own hierarchy)

Asset Examples

If STAC metadata is embedded at the root group (/):

{
  "assets": {
    "reflectance": {
      "href": "measurements/reflectance" // → /measurements/reflectance
    },
    "quality": {
      "href": "quality/flags" // → /quality/flags
    }
  }
}

If STAC metadata is embedded in a subgroup (/products/s2/):

{
  "assets": {
    "data": {
      "href": "data/b01" // → /products/s2/data/b01
    }
  }
}

Store Link Omission

The STAC store link relationship MUST be omitted from embedded STAC objects. Since the STAC object is embedded within the Zarr store itself, a store link would be self-referential and redundant.

Other link relationships (e.g., collection, parent, self, license) may be included as needed, typically pointing to external resources.

Encoding as Data Array

When using stac:encoding value of array, the STAC objects are stored in data array(s) within the Zarr store. This encoding allows to store a large set of STAC objects efficiently like an entire collection of items. There is no real value in storing a single STAC object as a data array, but it is supported for completeness. With this encoding, the stac:item or stac:collection attribute contains a relative path to the array node.

STAC Array Structure

This section is under development and opens for community feedback. The array encoding leverages Zarr's multidimensional array capabilities to store STAC metadata as sparse arrays with labeled space-time dimensions. This approach aligns naturally with core STAC metadata, and provides several key benefits:

  • Scalability: Supports millions of STAC items through chunked storage and spatial indexing
  • Natural Indexing: Space-time dimensions provide native spatial and temporal query capabilities
  • Consistency: Single source of truth for both data and metadata within the same Zarr store
  • Performance: Efficient multidimensional slicing for spatial and temporal queries

Dimensional Structure

STAC metadata is organized as a sparse multidimensional array with the following dimensions:

  • Time dimension: Indexed by STAC Item datetime (or time range for multi-temporal items)
  • Spatial dimensions: Indexed by spatial coordinates in any CRS (lat/lon, UTM, etc.)
  • Cell content: Each non-empty cell contains the complete STAC Item metadata as structured data

Coordinate Systems and Indexing

Temporal Coordinate

  • Primary temporal index using STAC Item datetime
  • Can use any time resolution (milliseconds to years) depending on data density
  • Supports labeled dimensions for non-uniform time intervals

Spatial Coordinates

  • Flexible spatial reference system (any CRS supported)
  • Could use geographic coordinates (lat/lon) for global collections
  • Could use projected coordinates (UTM, etc.) for regional collections
  • Could use grid references (MGRS, tile indices) for regular grids
  • Supports labeled dimensions for irregular spatial sampling

Example Structure

For a Sentinel-2 collection organized by MGRS tiles:

/stac_items/
├── zarr.json          # Array metadata defining dimensions and coordinates
├── datetime/          # Temporal coordinate array (1D)
├── mgrs_tile/         # Spatial coordinate array (1D) 
├── geometry/          # Geometry data per item (2D: time × space)
├── assets/            # Asset references per item (2D: time × space) 
├── properties/        # Flattened properties per item (2D: time × space)
└── links/             # Flattened links per item (2D: time × space)

Coordinate Arrays:

  • datetime: ["2024-01-01T10:30:00Z", "2024-01-02T10:30:00Z", ...]
  • mgrs_tile: ["32TQQ", "32TQR", "32TQL", ...]

Data Arrays:

  • metadata: 2D sparse array where metadata[t, s] contains the STAC Item at time t and location s
  • Most cells are empty (sparse), but where data exists, it contains complete STAC metadata

Examples

Validation

Schema Validation

The convention includes a JSON Schema that validates:

  1. Convention Structure: Ensures proper zarr_conventions metadata
  2. Encoding Field: Validates the stac:encoding value
  3. Mutual Exclusivity: Ensures only one of stac:item or stac:collection is present
  4. STAC Compliance: References official STAC schemas for Item and Collection validation

Validation Tools

You can validate examples using the included validation script:

npm install
npm test

Or validate a specific file:

node validate.js examples/minimal_item_example.json

STAC Validation

Since embedded objects are complete STAC Items or Collections, they can be validated using standard STAC validation tools:

# Extract the STAC object from Zarr metadata
jq '.attributes["stac:item"]' examples/minimal_item_example.json > item.json

# Validate with stac-validator (Python)
stac-validator item.json

Asset Organization

Follow STAC Zarr Best Practices for:

  • Asset hierarchy and organization
  • Band representation patterns
  • Multi-resolution data (multiscales)
  • Variable and dimension metadata

Known Implementations

This section helps potential implementers assess the convention's maturity and adoption.

Libraries and Tools

If you implement or use this convention, please add your implementation by submitting a pull request.

Datasets Using This Convention

If your dataset uses this convention, please add it here by submitting a pull request.

References

Related specifications:

About

STAC Convention

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published