Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/images/sample-paging-token-jwt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 42 additions & 1 deletion docs/suitability.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This document explores questions that may determine whether this project is suit

### Transactions

This stac-fastapi backend does not support transactions and never will. If you need the ability to modify data through the API consider using [stac-fastapi-pgstac](https://github.com/stac-utils/stac-fastapi-pgstac) or several other projects that support transactions.
This stac-fastapi backend does not support transactions and never will. If you need the ability to modify data through the API consider using [stac-fastapi-pgstac](https://github.com/stac-utils/stac-fastapi-pgstac) or other projects that support transactions.

### Data Duplication

Expand Down Expand Up @@ -65,3 +65,44 @@ Since 1.1.0 STAC-GeoParquet does not _require_ each collection to exist in a dif
#### stac-fastapi-geoparquet

The [stac-fastapi-geoparquet](https://pypi.org/project/stac-fastapi-geoparquet/) project aims to augment STAC-GeoParquet with a STAC API interface, however this project does not currently appear to offer a production-ready solution.

## Other Considerations

### Paging Tokens

Paging tokens included with search and collection items responses that span multiple pages work differently in this project compared to some other projects. The approach is considered safe, reasonable, and justified, and may be of interest to some users evaluating suitability of this project for a use-case.

The API is ignorant of paging state to help support a serverless deployment, and no state or token information is stored within the API or associated storage. However, the paging process requires some knowledge of state _somewhere_. State is contained within the paging tokens that are received and submitted by the client.

Paging tokens are included in the `next` and `prev` links in a multi-page response. Each token is a [JSON Web Token (JWT)](https://jwt.io/introduction) that provides all information required by the API to progress or regress across pages.

#### SQL

The payload of the JWT includes a parameterised SQL query and parameters that will be used to fetch the next or previous page of results. It also includes the ID of the most recent data load which the API uses to determine if it is paging across a data change, and which supports the behaviour described in the main README's [pagination section](../README.md#pagination).

With a paging token the client provides the API with the SQL query it should execute. In many scenarios this might raise security concerns. See [SQL Safety](#sql-safety) for how such concerns are addressed and negated.

#### SQL Safety

The JWT is secured using the HS256 algorithm and a private key that must be provided at deployment time, and is therefore considered immutable. Hashed JWTs are generally thought safe for verifying identity, and therefore should be capable of preventing SQL query tampering. Integration tests verify that the payload cannot be modified by a client between page requests ([example](https://github.com/sparkgeo/STAC-API-Serverless/blob/212f1a97f091efe19bd6f9edb6084b7f3d508d20/tests/with_environment/integration_tests/test_get_search.py#L201)). If it becomes possible - such as through credential theft - for a malicious actor to modify and re-sign a paging token JWT this is still not considered a significant concern. In a standard deployment the API reads Parquet index files with [read-only](https://github.com/sparkgeo/STAC-API-Serverless/blob/212f1a97f091efe19bd6f9edb6084b7f3d508d20/iac/cdk_deployment/cdk_deployment_stack.py#L68) access to an S3 bucket, which should prevent tampering with the index via SQL. If a malicious user is somehow able to modify files within the API container via a modified SQL query, any changes will be destroyed by the next Lambda invocation.

The JWT private key can be rotated at any time to reduce the risk of credential theft. This change will interrupt any clients actively paging across results, at which point affected clients can reissue their queries.

#### SQL Visibility

This approach exposes the API's SQL query content to clients, however no privileged information can be exposed in this way. The content of the SQL query is comprised entirely of:
1. information that can be gleaned from a review of this repository, and
2. parameters provided by the client.

The API uses placeholders to represent the location of the parquet index files it queries (e.g. an S3 URI) and replaces these immediately prior to SQL execution, so clients have no additional visibility of a deployment's storage infrastructure via a paging token.

The following image shows the content of a sample paging token returned in response to a search query:

```sh
curl -X 'POST' \
'https://host/search' \
-H 'Content-Type: application/json' \
-d '{"collections": ["joplin"]}'
```

![sample paging token jwt](./images/sample-paging-token-jwt.png "Sample Paging Token JWT")