Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorials for ingest pipelines #179

Closed
joverlee521 opened this issue Jan 18, 2024 · 6 comments · Fixed by #195
Closed

Add tutorials for ingest pipelines #179

joverlee521 opened this issue Jan 18, 2024 · 6 comments · Fixed by #195
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@joverlee521
Copy link
Contributor

joverlee521 commented Jan 18, 2024

Initially proposed by @trvrb in our meeting with WA DOH on 2024-01-17.
The tentative due date for this objective is 2024-02-29.

We should have tutorials for running and creating ingest pipelines from NCBI data in parallel with the Running a pathogen workflow and Creating a pathogen workflow tutorials.

The ultimate goal would be for an external user to be able to follow the tutorials to set up their own ingest pipeline for any viral pathogen using NCBI data.

@joverlee521 joverlee521 added the documentation Improvements or additions to documentation label Jan 18, 2024
@joverlee521
Copy link
Contributor Author

As @kimandrews works through setting up ingest for measles, we can work on putting together a step by step tutorial for creating the ingest workflow.

@joverlee521
Copy link
Contributor Author

joverlee521 commented Jan 25, 2024

Jotting down some general thoughts to keep track of from conversations with @kimandrews

  • who is our intended audience? What is the expected level of experience with Nextstrain?
  • will need guide for installing git subrepo for managing the ingest/vendored scripts
  • document commands to get the first fetch from NCBI to give users a view of the raw data
  • document commands to run full ingest pipeline to see curated output
  • provide guidance on how to assess data to make specific config changes
  • pointing to an existing ingest workflow is really helpful to understanding
  • need pathogen specific knowledge (e.g. measles strain name schema)

@tsibley
Copy link
Member

tsibley commented Jan 25, 2024

  • will need guide for installing git subrepo for managing the ingest/vendored scripts

How about we vendor git subrepo so that our vendored stuff is bootstrapped?

I know the installation docs for git subrepo have you clone its repo and source some shell init, but taking a look deeper there's only three files we strictly need (git-subrepo, help-functions.bash, and bash+.bash) and it'd be pretty easy to vendor just those or even pack them into a single file (I just did it).

(Hell, we could git subrepo the git-subrepo repo into ingest/vendored/, but I worry I might not be welcome anymore if I seriously suggest that…)

(…the other bad suggestion is that git subtree doesn't have this issue either…)

Bootstrapping doesn't solve the git subrepo usage hurdles, and I know the longer term solution is ditching vendoring entirely in favor of augur curate et al., but it would solve the installation hurdle.

@victorlin
Copy link
Member

…the other bad suggestion is that git subtree doesn't have this issue either…

I don't think it's a bad suggestion. Wrote more in nextstrain/ingest#3 (comment).

@tsibley
Copy link
Member

tsibley commented Jan 31, 2024

"bad" because I'm not convinced its worth it to re-open the decision there and switch stuff over. Maybe it is!

@joverlee521
Copy link
Contributor Author

Another option is to bypass setting up git subrepo in the tutorial and just have people use the vendored scripts in the pathogen-repo-guide.

The ingest scripts will not be updated that frequently so only mention the need for using git subrepo if the user wants to update the vendored scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants