Skip to content

Commit

Permalink
ingest: Always provide default config values
Browse files Browse the repository at this point in the history
Part of work to update this repo to match the pathogen-repo-template.

Always provide default config values that can then easily be overridden
with --configfiles/--config options. This makes it simpler to change a
subset of config values or extend the default configs.

Renames the config/ to defaults/ to reflect this change.
Match changes in pathogen-repo-template in nextstrain/pathogen-repo-guide#27
  • Loading branch information
joverlee521 committed Jan 30, 2024
1 parent 87263e7 commit cf4bdd6
Show file tree
Hide file tree
Showing 9 changed files with 11 additions and 11 deletions.
8 changes: 4 additions & 4 deletions ingest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ This will produce two files (within the `ingest` directory):
Run the complete ingest pipeline and upload results to AWS S3 with

```sh
nextstrain build . --configfiles config/config.yaml config/optional.yaml
nextstrain build . --configfiles defaults/optional.yaml
```

### Adding new sequences not from GenBank
Expand All @@ -57,12 +57,12 @@ Do the following to include sequences from static FASTA files.
!ingest/data/{file-name}.ndjson
```

3. Add the `file-name` (without the `.ndjson` extension) as a source to `ingest/config/config.yaml`. This will tell the ingest pipeline to concatenate the records to the GenBank sequences and run them through the same transform pipeline.
3. Add the `file-name` (without the `.ndjson` extension) as a source to `defaults/config.yaml`. This will tell the ingest pipeline to concatenate the records to the GenBank sequences and run them through the same transform pipeline.

## Configuration

Configuration takes place in `config/config.yaml` by default.
Optional configs for uploading files and Slack notifications are in `config/optional.yaml`.
Configuration takes place in `defaults/config.yaml` by default.
Optional configs for uploading files and Slack notifications are in `defaults/optional.yaml`.

### Environment Variables

Expand Down
4 changes: 2 additions & 2 deletions ingest/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ min_version(
"7.7.0"
) # Snakemake 7.7.0 introduced `retries` directive used in fetch-sequences

if not config:

configfile: "config/config.yaml"
# Use default configuration values. Override with Snakemake's --configfile/--config options.
configfile: "defaults/config.yaml"


send_slack_notifications = config.get("send_slack_notifications", False)
Expand Down
File renamed without changes.
8 changes: 4 additions & 4 deletions ingest/config/config.yaml → ingest/defaults/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ sources: ['genbank']
# Pathogen NCBI Taxonomy ID
ncbi_taxon_id: '10244'
# Renames the NCBI dataset headers
ncbi_field_map: 'config/ncbi-dataset-field-map.tsv'
ncbi_field_map: 'defaults/ncbi-dataset-field-map.tsv'

# Params for the transform rule
transform:
Expand Down Expand Up @@ -43,9 +43,9 @@ transform:
geolocation_rules_url: 'https://raw.githubusercontent.com/nextstrain/ncov-ingest/master/source-data/gisaid_geoLocationRules.tsv'
# Local geolocation rules that are only applicable to mpox data
# Local rules can overwrite the general geolocation rules provided above
local_geolocation_rules: 'config/geolocation-rules.tsv'
local_geolocation_rules: 'defaults/geolocation-rules.tsv'
# User annotations file
annotations: 'config/annotations.tsv'
annotations: 'defaults/annotations.tsv'
# ID field used to merge annotations
annotations_id: 'accession'
# Field to use as the sequence ID in the FASTA file
Expand Down Expand Up @@ -76,4 +76,4 @@ nextclade:
# Field to use as the sequence ID in the Nextclade file
id_field: 'seqName'
# Fields from a Nextclade file to be renamed (if desired) and appended to a metadata file
field_map: 'config/nextclade-field-map.tsv'
field_map: 'defaults/nextclade-field-map.tsv'
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion phylogenetic/config/description.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Our bioinformatic processing workflow can be found at [github.com/nextstrain/mpo

#### Underlying data
We curate sequence data and metadata from the [NCBI Datasets command line tools](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install/),
using an NCBI Taxonomy ID defined in [ingest/config/config.yaml](https://github.com/nextstrain/mpox/blob/master/ingest/config/config.yaml), as starting point for these analyses.
using an NCBI Taxonomy ID defined in [ingest/defaults/config.yaml](https://github.com/nextstrain/mpox/blob/master/ingest/defaults/config.yaml), as starting point for these analyses.
Curated sequences and metadata are available as flat files at:
- [data.nextstrain.org/files/workflows/mpox/sequences.fasta.xz](https://data.nextstrain.org/files/workflows/mpox/sequences.fasta.xz)
- [data.nextstrain.org/files/workflows/mpox/metadata.tsv.gz](https://data.nextstrain.org/files/workflows/mpox/metadata.tsv.gz)
Expand Down

0 comments on commit cf4bdd6

Please sign in to comment.