-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vendored scripts #1
Conversation
git-subtree-dir: ingest/vendored git-subtree-split: 92a8868
This moves the script to the vendored directory, which is a git subtree of https://github.com/nextstrain/ingest (currently at branch apply-geolocation-rules). The file suffix is removed to match how it appears in the other repos which use it. As per [Overview of duplicated scripts](nextstrain/ingest#1) this script also appears in: * [monkeypox](https://github.com/nextstrain/monkeypox/blob/a1f0d7b757d323d87edcbe61c6c5ccfbdf47722c/ingest/bin/apply-geolocation-rules) * [rsv](https://github.com/nextstrain/rsv/blob/ba171f4a43110382c38b6154be3febd50408d7bf/ingest/bin/apply-geolocation-rules) * [dengue, branch new_ingest](https://github.com/nextstrain/dengue/blob/247b2fd897361f2548627de1d97d45fae4115c5c/ingest/bin/apply-geolocation-rules) All three of those scripts are identical to each other. The script vendored here contains two code changes (whitespace removed from diffs): **Ignore comment lines in the location-rules TSV** ```diff < if line.lstrip()[0] == '#': --- > if line.strip()=="" or line.lstrip()[0] == '#': ``` **Allow fields to be missing from the input NDJSON** The script previously mandated that the input NDJSON had all four fields (region/country/division/location). This is relaxed here, with an empty string used if the field is not present. ```diff < annotated_values = transform_geolocations(geolocation_rules, [record[field] for field in location_fields]) --- > annotated_values = transform_geolocations(geolocation_rules, [record.get(field, '') for field in location_fields]) ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for showing a clear example!
You said you wouldn't merge this PR, but I assume you can force-push to the same PR branch with a new subtree, or take these suggestions for a similar future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
- Add a section in the pathogen repo README on how to update these scripts with
git subtree pull
- Add that command in commit messages such as 7d507f7 where those changes are the result of pulling new contents from the subtree remote.
Copied from https://github.com/nextstrain/monkeypox/blob/62fca491c6775573ad036eedf34b271b4952f2c2/ingest/bin/apply-geolocation-rules | ||
with two changes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script vendored here contains two code changes
A comment is used to describe the pathogen-specific changes, when those might be better described by git history.
Suggestion: Put these changes in a separate commit. I'd expect something like one commit to add the original contents from git subtree pull
, then another commit with the pathogen-specific modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I just realized that 7d507f7 is a direct copy of nextstrain/ingest@5afbcc7 and these aren't pathogen-specific changes. You can disregard this suggestion then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that 7d507f7 is a direct copy of nextstrain/ingest@5afbcc7
Yeah, or perhaps the other way around! The ingest commit was created by pushing the subtree from this (hbv) repo. In hindsight, probably not an approach I'd reach for again, but it was a good learning exercise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish the auto-generated messages (f9d6407, ee78e39) had more information on the remote. If I'm reading the git history alone, it's not clear where these come from.
Suggestion: Modify one of the auto-generated messages to include the remote URL, https://github.com/nextstrain/ingest. I'm not sure which is best since the remote commit hash is in f9d6407, but only the merge commit ee78e39 will show up as the one making changes.
Superseded by #5 |
A draft PR to show how vendored files could be consumed by pathogen repos
See nextstrain/ingest#2 for the PR which add the script to the ingest repo, vendored here as
ingest/vendored
See nextstrain/ingest#3 for the
git subtree
commands I usedNote that I probably won't merge this PR, instead I'll recreate the subtree from its
main
branch once the above PR has been merged.