Skip to content

WycliffeAssociates/d43-catalog

 
 

Repository files navigation

master: Build Status Coverage Status

develop: Build Status Coverage Status

d43-catalog

These are the AWS Lambda functions for generating the API catalog endpoint from the Door43 Catalog organization in our Door43 Git Service.

Requirements

How it Works

When a new repository is added or forked into the Door43 Catalog organization a chain reaction is started that eventually adds the content into the API, assuming all the checks passed. Here is an overview:

  1. Someone creates a new repository or forks a repository into the Door43 Catalog organization
  2. The organization triggers the webhook function which queues the latest git commit for processing.

The next few functions run on a fixed schedule. If errors occur they are reported and the process resumed at the next scheduled run.

If a function produces errors 4 times in a row an email is sent to administrators.

  1. The signing function looks for and signs new things in the queue.
  2. The catalog function takes everything in the queue and generates a new api catalog file. The content is now in the API!
  3. The ts_v2_catalog function converts the API catalog file into the legacy translationStudio API.
  4. The uw_v2_catalog function converts the API catalog file into the legacy unfoldingWord App Catalog.
  5. The fork function checks to see if new repositories exist in the organization and executes the webhook function if necessary.

The content in step (1) is now available in all three API endpoints.

  1. The acceptance function runs when the catalog file is saved in step (4) above. And performs acceptance tests on the file to ensure it was generated correctly.

Function Description

The following provides a functional description of the functions in this repository.

webhook

Runs when a change is made in the Door43 Catalog

  • Accept webhook from organization.
  • Reads manifest from the repository (via HTTPS)
  • Performs some initial manifest validation. See Manifest Specification
  • Uploads files and adds/updates an entry to the queue

signing

This function is run on a schedule and does the following:

  • Identifies items in the queue that require signing.
  • Signs files as necessary
  • Verifies that signature checks out
  • Copies files to proper location on CDN as necessary.
  • Uploads the signature file to the CDN
  • Updates the queued item with appropriate urls and file meta data as necessary.

catalog

This function is run on a schedule and does the following:

  • Performs a consistency check on queued items
  • Generates the new catalog file
  • Uploads the catalog file to the API.
  • Records the catalog status in the status table.
  • Errors or consistency failures are reported as errors.

acceptance

After a new catalog file is written to S3, this function does the following:

  • Make sure structure of catalog file is correct
  • Make HEAD request for each resource (every URL) in catalog to verify it exists
  • Report any errors

Technically this is all duplicate testing of what we are already doing elsewhere in the pipeline. This function is the "oops" catcher.

fork

This function is run on a schedule and does the following:

  • Checks if there are new repositories in the Door43 Catalog organization
  • Triggers the webhook function for each new repository found.
  • Triggers the webhook function for queued items that are flaged as dirty.

ts_v2_catalog

This function is run on a schedule and does the following:

  • Checks for a new v3 API catalog in the status table
  • Builds a v2 tS api from the new/updated v3 catalog.

uw_v2_catalog

This function is run on a schedule and does the following:

  • Checks for a new v3 API catalog in the status table
  • Builds a v2 uW api from the new/updated v3 catalog.

trigger

This function is run via AWS cron every 5 minutes and does the following:

  • Executes those function which run on a schedule. e.g. catalog, signing, etc.

AWS Configuration

Here's a high level overview of the AWS configuration. For Swagger definitions look in the aws_configuration folder. You can create an API in API Gateway by importing these Swagger definitions.

The following functions are configured as api endpoints within API Gateway:

  • webhook: /webhook
  • catalog: /lambda/catalog
  • fork: /lambda/fork
  • signing: /lambda/signing
  • ts_v2_catalog: /lambda/ts-v2-catalog
  • uw_v2_catalog: /lambda/uw-v2-catalog

For example you can trigger the fork lambda at https://api.door43.org/v3/lambda/fork.

The functions are not designed to always return useful information in the browser and may timeout, however they are still running properly.

The name of the stage in API Gateway determines the operating environment. If the stage name begins with prod the functions will operate on production databases. If the stage name begins with anything other than prod the functions will prefix databases with the stage name.

For example:

  • a stage named prod would use the d43-catalog-errors db for reporting errors.
  • a stage named dev would use the dev-d43-catalog-errors db for reporting errors.
  • a stage named test would use the test-d43-catalog-errors db for reporting errors.

Stage Variables

Stage variables are configured within the stage defined in API Gateway. These variables are accessible within lambdas from the event parameter. e.g. event['stage-variables']

  • cdn_bucket
  • cdn_url
  • to_email
  • from_email
  • api_bucket
  • api_url
  • gogs_url
  • gogs_org
  • gogs_token
  • log_level how noisy the logger should be. debug|info|warning|error
  • version the api version

acceptance function configuration

The acceptance function is ran according to a CloudWatch rule which runs when the catalog file is added to the api S3 bucket.

trigger function configuration

The trigger function is ran according to a CloudWatch rule which is configured to run every 5 minutes via a cron job.

Dynamo DB Configuration

The following database tables are used by the API pipeline described above. Please note additional tables may be necessary when catering to multiple stages (described above).

  • d43-catalog-errors tracks errors encountered in functions. Keyed with lambda.
  • d43-catalog-in-progress tracks items in the queue. Keyed with repo_name.
  • d43-catalog-running tracks functions that are running. This prevents certain functions from having multiple instances running at the same time. Keyed with lambda.
  • d43-catalog-status tracks the status of the catalog generation. Keyed with api_version.

Tools

CSV to USFM3

This tool will convert a csv file containing Greek words to USFM 3 format. You may execute the following command to learn how to use the tool.

python execute.py csvtousfm3 -h

Map tW to USFM3

This tool will inject tW links into the USFM generated by csvtousfm3. This tool is designed to replace the functionality of the config.yaml found within a tW RC with the newly generated USFM3 content. As such this is mostly a one time use tool.

If you are not sure what to use this tool for you probably shouldn't use it.

You may execute the following command to learn how to use the tool.

python execute.py maptwtousfm3 -h

Convert OSIS to USFM3

This tool will convert a directory of OSIS files (xml) to a new directory of USFM3 files.

You may execute the following command to learn how to use the tool.

python execute.py osistousfm3 -h

Testing

You can run tests be executing the following:

python -m unittest discover -s tests

DB Info

A dynamodb instance needs to exist and the following table need to exist

d43-catalog-in-progress with key repo_name (string) d43-catalog-errors with key lambda (string) d43-catalog-running with key lambda (string) d43-catalog-status with key api_version (string)

In the case that you are using api gateway prefix will be the stage name.

Packages

No packages published

Languages

  • Python 99.5%
  • Other 0.5%