REST API for:
- Creating and managing event streams
- Sending data to event streams
Tests are run using tox: make test
For tests and linting we use pytest, flake8 and black.
make init
Login to aws:
make login-dev
Set necessary environment variables:
export AWS_PROFILE=okdata-dev
export AWS_REGION=eu-west-1
export OKDATA_ENVIRONMENT=dev
Start up Flask app locally. Binds to port 5000 by default:
make run
Note: make init
will not install the boto3 library, since this dependency is already installed on the server.
Therefore you must either run make test
(which installs boto3) or run .build_venv/bin/python -m pip install boto3
before
make run
Change port/environment:
export FLASK_ENV=development
export FLASK_RUN_PORT=8080
Deploy to both dev and prod is automatic via GitHub Actions on push to main. You can alternatively deploy from local machine (requires saml2aws
) with: make deploy
or make deploy-prod
.
Create a new event stream: curl -H "Authorization: bearer $TOKEN" -H "Content-Type: application/json" --data '{}' -XPOST http://127.0.0.1:8080/{dataset-id}/{version}
Enable an event sink: curl -H "Authorization: bearer $TOKEN" -H "Content-Type: application/json" --data '{"type":"s3"}' -XPOST http://127.0.0.1:8080/{dataset-id}/{version}/sinks
Get all sinks: curl -H "Authorization: bearer $TOKEN" -XGET http://127.0.0.1:8080/{dataset-id}/{version}/sinks
Get a single sink: curl -H "Authorization: bearer $TOKEN" -XGET http://127.0.0.1:8080/{dataset-id}/{version}/sinks/{sink_type}
Disable an event sink: curl -H "Authorization: bearer $TOKEN" -H "Content-Type: application/json" -XDELETE http://127.0.0.1:8080/{dataset-id}/{version}/sinks/{sink_type}
This is the base resource of your event stream. In other words the Stream resource can be regarded as the event stream whilst the Subscribable and Sink resources can be regarded as features on the event stream. The Stream's Cloud Formation stack contains the following resources:
- RawDataStream: Kinesis data stream
dp.{confidentiality}.{dataset_id}.raw.{version}.json
. - RawPipelineTrigger: Lambda event source mapping from RawDataStream to pipeline-router.
- ProcessedDataStream: Kinesis data stream
dp.{confidentiality}.{dataset_id}.processed.{version}.json
. - ProcessedPipelineTrigger: Lambda event source mapping from ProcessedDataStream to pipeline-router.
The Subscribable resource can be regarded as a feature on the event stream that can either be enabled or disabled. If enabled, you connect to event-data-subscription websocket API and listen to events on your event stream. The subscribable's Cloud Formation stack consists of the following AWS resources:
- SubscriptionSource: Lambda event source mapping from ProcessedDataStream to event-data-subscription.
The Sink resources can be regarded as destinations that your event stream writes to and entities with access can read from. So far we have two different event-sinks.
The S3 sink's Cloud Formation stack contains the following AWS resources:
- SinkS3Resource: Kinesis firehose delivery stream with source=ProcessedDataStream and destination=S3.
- SinkS3ResourceIAM: Iam role for consuming data from ProcessedDataStream and writing objects to S3. The role is used by SinkS3Resource.
The Elasticsearch sink's Cloud Formation stack contains the following AWS resources:
- SinkElasticsearchResource: Kinesis firehose delivery stream with source=ProcessedDataStream, destination=S3 and S3 backup for failed documents.
- SinkElasticsearchResourceIAM: Iam role for consuming data from ProcessedDataStream and posting to ES(elastic search). The role is used by SinkElasticsearchResource.
- SinkElasticsearchS3BackupResourceIAM: IAM role for writing objects to S3. The role is used by SinkElasticsearchResource.
When an Elasticsearch sink is enabled and when data is stored(not backward compatible ), you can access data in a given date through: {url}/streams/{dataset-id}/{version}/events?from_date={from_date}&to_date={to_date}
- Example prod:
https://api.data.oslo.systems/streams/renovasjonsbiler-status/1/events?from_date=2020-10-18&to_date=2020-10-19