Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,5 @@ dist

# indexer when run locally
packages/stac-index/src/stac_index/indexer/index_data/
# indexer when run by `run-with-remote-source.sh`
.remote-source-index
39 changes: 34 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,42 @@ See [Development](./docs/development.md) for detailed guidance on how to work on

### Quickstart

To get a quick demo up and running execute any of the following scripts and navigate to http://localhost:8123/api.html
To get a quick demo up and running with a test dataset execute any of the following scripts and navigate to http://localhost:8123/api.html

```sh
scripts/run-with-local-s3.sh # loads a sample dataset into minio, indexes it, loads the index into minio, and runs the API
scripts/run-with-local-file.sh # indexes a sample dataset on the filesystem and runs the API
scripts/run-with-local-http.sh # loads a sample dataset into a HTTP fileserver, indexes it, and runs the API
scripts/run-with-remote-source.sh https://capella-open-data.s3.us-west-2.amazonaws.com/stac/catalog.json # indexes a public static STAC catalog over HTTPS and runs the API
# loads a sample dataset into minio, indexes it, loads the index into minio, and runs the API
scripts/run-with-local-s3.sh
# indexes a sample dataset on the filesystem and runs the API
scripts/run-with-local-file.sh
# loads a sample dataset into a HTTP fileserver, indexes it, and runs the API
scripts/run-with-local-http.sh
```

### Index Remote STAC Catalog

This project includes a convenience script to index and serve a remote STAC catalog. This script will fully index the remote STAC catalog each time it is run. This may not be the most efficient way to meet your needs, but it does help demonstrate some of this project's capabilities.

```sh
# indexes a public static STAC catalog over HTTPS and runs the API
scripts/run-with-remote-source.sh https://esa.pages.eox.at/cubes-and-clouds-catalog/MOOC_Cubes_and_clouds/catalog.json
```

Output includes the following information about the index.
```sh
* Indexing may take some time, depending on the size of the catalog
* Indexing to /.../source/sparkgeo/STAC-API-Serverless/.remote-source-index/httpsesapageseoxatcubesandcloudscatalogMOOCCubesandcloudscatalogjson
```

The generated index files can be inspected at `.../.remote-source-index/httpsesapageseoxatcubesandcloudscatalogMOOCCubesandcloudscatalogjson` if necessary. If at a later time you want to run the API against this same index, without re-indexing the remote STAC catalog, this can be achieved with the following:

```sh
docker run \
--rm \
-it \
-v $PWD/.remote-source-index/httpsesapageseoxatcubesandcloudscatalogMOOCCubesandcloudscatalogjson:/index:ro \
-e stac_api_indexed_index_manifest_uri=/index/manifest.json \
-p 8123:80 \
sparkgeo/stac_fastapi_indexed
```

## Overview
Expand Down
9 changes: 5 additions & 4 deletions docker-compose.remote-source.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@ services:
INDEX_ROOT_CATALOG_URI: ${root_catalog_uri}
INDEX_CONFIG_PATH: /index-config.json
INDEX_PUBLISH_PATH: /output
INDEX_MANIFEST_JSON_URI: ${index_manifest_json_uri}
AWS_ACCESS_KEY_ID:
AWS_REGION:
AWS_SECRET_ACCESS_KEY:
AWS_SESSION_TOKEN:
volumes:
- "indexer-output:/output:rw"
# this compose file will not work without $tmp_index_config_path being set (this is intentional, it is set by scripts/run-with-remote-source.sh)
- "${tmp_index_config_path}:/index-config.json:ro"
- "${tmp_index_path:-indexer-output}:/output:rw"
- "${tmp_index_config_path:-indexer-config-fallback}:/index-config.json:ro"

api:
volumes:
- indexer-output:/index:ro
- "${tmp_index_path:-indexer-output}:/index:ro"
environment:
stac_api_indexed_index_manifest_uri: /index/manifest.json
AWS_ACCESS_KEY_ID:
Expand All @@ -32,3 +32,4 @@ services:

volumes:
indexer-output:
indexer-config-fallback:
4 changes: 2 additions & 2 deletions docker/indexer/command.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ if [ -n "$INDEX_ROOT_CATALOG_URI" ]; then
fi

if [ -n "$INDEX_MANIFEST_JSON_URI" ]; then
manifest_json_uri="--manifest_json_uri $INDEX_MANIFEST_JSON_URI"
manifest_json_uri_argument="--manifest_json_uri $INDEX_MANIFEST_JSON_URI"
fi

if [ -n "$INDEX_CONFIG_PATH" ]; then
if [ -n "$INDEX_CONFIG_PATH" ] && [ -f "$INDEX_CONFIG_PATH" ]; then
index_config_argument="--index_config $INDEX_CONFIG_PATH"
fi

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -471,7 +471,10 @@ def _log_index_event(self: Self, root_catalog_uri: str) -> None:

def _insert_errors(self: Self, errors: list[IndexingError]) -> None:
for error in errors:
save_error(self._conn, error)
try:
save_error(self._conn, error)
except Exception as e:
_logger.exception("failed to insert indexing error: {}".format(e))

async def _load_existing_index(self: Self, manifest_json_uri: str) -> IndexManifest:
source_reader = get_reader_for_uri(uri=manifest_json_uri)
Expand Down
26 changes: 17 additions & 9 deletions scripts/run-with-remote-source.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,35 @@ set -e

pushd $(dirname $0)/..

if [ "$#" -ne 1 ]; then
if [ "$#" -lt 1 ]; then
echo "Usage: $0 <root-catalog-uri>"
exit 1
fi

export root_catalog_uri="$1"

if [[ $root_catalog_uri == s3://* ]]; then
echo; echo "* Assumes \$AWS_ACCESS_KEY_ID, \$AWS_REGION, \$AWS_SECRET_ACCESS_KEY, and (optionally) \$AWS_SESSION_TOKEN are set for obstore *"; echo
fi

export tmp_index_config_path=$(mktemp)
if [ -z "${FIXES_TO_APPLY}" ]; then
echo "{}" > $tmp_index_config_path
export tmp_index_path=$PWD/.remote-source-index/$(echo "$root_catalog_uri" | tr -cd '[:alnum:]')
echo; echo "* Indexing may take some time, depending on the size of the catalog";
echo "* Indexing to $tmp_index_path"; echo
# Persist generated index and manifest files locally to support faster repeat runs against the same remote source.
if [ -f $"$tmp_index_path/manifest.json" ]; then
# Tell the indexer there's already an existing index to update.
export index_manifest_json_uri="/output/manifest.json"
unset root_catalog_uri
else
fixes_json=$(echo "${FIXES_TO_APPLY}" | sed "s/,\s*/\", \"/g")
echo "{\"fixes_to_apply\": [\"${fixes_json}\"]}" > $tmp_index_config_path
# No point evaluating this if updating an existing index as it will be ignored.
if [ -n "${FIXES_TO_APPLY}" ]; then
export tmp_index_config_path=$(mktemp)
fixes_json=$(echo "${FIXES_TO_APPLY}" | sed "s/,\s*/\", \"/g")
echo "{\"fixes_to_apply\": [\"${fixes_json}\"]}" > $tmp_index_config_path
fi
fi

dco="docker compose -f docker-compose.base.yml -f docker-compose.remote-source.yml"

dco="docker compose -f docker-compose.base.yml -f docker-compose.remote-source.yml"
$dco build
echo; echo "* Indexing may take some time, depending on the size of the catalog *"; echo
sleep 1
$dco up --force-recreate
3 changes: 2 additions & 1 deletion src/stac_fastapi/indexed/settings.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from functools import lru_cache
from typing import Optional
from uuid import uuid4

from stac_fastapi.types.config import ApiSettings, SettingsConfigDict

Expand All @@ -10,7 +11,7 @@ class _Settings(ApiSettings):
)
log_level: str = "info"
index_manifest_uri: str = "/index/manifest.json"
token_jwt_secret: str
token_jwt_secret: str = uuid4().hex
duckdb_threads: Optional[int] = None
deployment_root_path: Optional[str] = None
install_duckdb_extensions: bool = (
Expand Down