Skip to content

Commit

Permalink
Add ingest pipeline and restructure phylogeny pipeline #28
Browse files Browse the repository at this point in the history
As part of modernizing various pathogen workflow repos, these changes can be broadly grouped by:

* Add ingest pipeline (copied from mpox repo)
  * Use ncbi-datasets to fetch GenBank records
  * Add the zika data processing steps from fauna
* Move the phylogenetic workflow to its own folder
  * Add rules for merging USVI data
  * Move rules to sub snakefiles to match the pathogen-repo-template
  • Loading branch information
j23414 authored Jan 19, 2024
2 parents ace74cd + 0560686 commit 5f11069
Show file tree
Hide file tree
Showing 69 changed files with 3,243 additions and 477 deletions.
23 changes: 21 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,24 @@ on:
- pull_request

jobs:
ci:
uses: nextstrain/.github/.github/workflows/pathogen-repo-ci.yaml@master
pathogen-ci:
strategy:
matrix:
runtime: [docker, conda]
permissions:
id-token: write
uses: nextstrain/.github/.github/workflows/pathogen-repo-build.yaml@master
secrets: inherit
with:
runtime: ${{ matrix.runtime }}
run: |
nextstrain build \
phylogenetic \
--configfile profiles/ci/profiles_config.yaml
artifact-name: output-${{ matrix.runtime }}
artifact-paths: |
phylogenetic/auspice/
phylogenetic/results/
phylogenetic/benchmarks/
phylogenetic/logs/
phylogenetic/.snakemake/log/
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ build/
environment*

# Snakemake state dir
/.snakemake
.snakemake/
benchmarks/
logs/

# Local config overrides
/config_local.yaml
Expand Down
7 changes: 7 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Developer guide

## CI

Tests are run through GitHub Actions when triggered by events as defined by [.github/workflows/ci.yaml][]

[.github/workflows/ci.yaml]: ./.github/workflows/ci.yaml
90 changes: 7 additions & 83 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,88 +1,12 @@
# nextstrain.org/zika
# Nextstrain repository for Zika virus

This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at
[nextstrain.org/zika](https://nextstrain.org/zika).
This repository contains two workflows for the analysis of Zika virus data:

The build encompasses fetching data, preparing it for analysis, doing quality
control, performing analyses, and saving the results in a format suitable for
visualization (with [auspice][]). This involves running components of
Nextstrain such as [fauna][] and [augur][].
- [`ingest/`](./ingest) - Download data from GenBank, clean and curate it and upload it to S3
- [`phylogenetic/`](./phylogenetic) - Make phylogenetic trees for nextstrain.org

All Zika-specific steps and functionality for the Nextstrain pipeline should be
housed in this repository.
Each folder contains a README.md with more information.

_This build requires Augur v6._
## Documentation

[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml)

## Usage

If you're unfamiliar with Nextstrain builds, you may want to follow our
[quickstart guide][] first and then come back here.

There are two main ways to run & visualise the output from this build:

The first, and easiest, way to run this pathogen build is using the [Nextstrain
command-line tool][nextstrain-cli]:
```
nextstrain build .
nextstrain view auspice/
```

See the [nextstrain-cli README][] for how to install the `nextstrain` command.

The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended).
The build may then be run via:
```
snakemake
auspice --datasetDir auspice/
```

Build output goes into the directories `data/`, `results/` and `auspice/`.

## Configuration

Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule
specifies its file inputs and output and also its parameters. There is little redirection and each
rule should be able to be reasoned with on its own.


## Input data

This build starts by downloading sequences from
https://data.nextstrain.org/files/zika/sequences.fasta.xz
and metadata from
https://data.nextstrain.org/files/zika/metadata.tsv.gz.
These are publicly provisioned data by the Nextstrain team by pulling sequences
from NCBI GenBank via ViPR and performing
[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md).

Data from GenBank follows Open Data principles, such that we can make input data
and intermediate files available for further analysis. Open Data is data that
can be freely used, re-used and redistributed by anyone - subject only, at most,
to the requirement to attribute and sharealike.

We gratefully acknowledge the authors, originating and submitting laboratories
of the genetic sequences and metadata for sharing their work in open databases.
Please note that although data generators have generously shared data in an open
fashion, that does not mean there should be free license to publish on this
data. Data generators should be cited where possible and collaborations should
be sought in some circumstances. Please try to avoid scooping someone else's
work. Reach out if uncertain. Authors, paper references (where available) and
links to GenBank entries are provided in the metadata file.

A faster build process can be run working from example data by copying over
sequences and metadata from `example_data/` to `data/` via:
```
mkdir -p data/
cp -v example_data/* data/
```

[Nextstrain]: https://nextstrain.org
[fauna]: https://github.com/nextstrain/fauna
[augur]: https://github.com/nextstrain/augur
[auspice]: https://github.com/nextstrain/auspice
[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options
[nextstrain-cli]: https://github.com/nextstrain/cli
[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md
[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart
- [Contributor documentation](./CONTRIBUTING.md)
236 changes: 0 additions & 236 deletions Snakefile

This file was deleted.

Loading

0 comments on commit 5f11069

Please sign in to comment.