This repository uses pytest
:
# create a venv
python3.7 -m venv venv/
# install requirements
venv/bin/pip install -r requirements.txt
# run pytest with all linters and 4 workers in parallel
venv/bin/pytest --black --docstyle --flake8 --mypy-ignore-missing-imports -n 4
To provide authentication credentials for the Google Cloud API the GOOGLE_APPLICATION_CREDENTIALS
environment variable must be set to the file path of the JSON file that contains the service account key.
See Mozilla BigQuery API Access instructions to request credentials if you don't already have them.
- Make a directory for test resources named
tests/{query_name}/{test_name}/
, e.g.tests/clients_last_seen_v1/test_single_day
query_name
must match a query file namedsql/{query_name}.sql
, e.g.sql/clients_last_seen_v1.sql
test_name
should start withtest_
, e.g.test_single_day
- Add
.ndjson
files for input tables, e.g.clients_daily_v6.ndjson
- Include the dataset prefix if it's set in the tested query,
e.g.
analysis.clients_last_seen_v1.ndjson
- This will result in the dataset prefix being removed from the query,
e.g.
query.replace("analysis.clients_last_seen_v1", "clients_last_seen_v1")
- This will result in the dataset prefix being removed from the query,
e.g.
- Include the dataset prefix if it's set in the tested query,
e.g.
- Add
expect.ndjson
to validate the resultDATE
andDATETIME
type columns in the result are coerced to strings using.isoformat()
- Columns named
generated_time
are removed from the result before comparing toexpect
because they should not be static
- Optionally add
.schema.json
files for input table schemas, e.g.clients_daily_v6.schema.json
- Optionally add
query_params.yaml
to define query parametersquery_params
must be a list
- If the destination table is also an input table then
generated_time
should be a requiredDATETIME
field to ensure minimal validation - Input table files
- All of the formats supported by
bq load
are supported - Formats other than
.ndjson
and.csv
should not be used because they are not human readable
- All of the formats supported by
expect.ndjson
- File extensions
yaml
,json
andndjson
are supported - Formats other than
ndjson
should not be used because they are not supported bybq load
- File extensions
- Schema files
- Setting the description of a top level field to
time_partitioning_field
will cause the table to use it for time partitioning - File extensions
yaml
,json
andndjson
are supported - Formats other than
.json
should not be used because they are not supported bybq load
- Setting the description of a top level field to
- Query parameters
- Scalar query params should be defined as a dict with keys
name
,type
ortype_
, andvalue
query_parameters.yaml
may be used instead ofquery_params.yaml
, but they are mutually exclusive- File extensions
yaml
,json
andndjson
are supported
- Scalar query params should be defined as a dict with keys