Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

augur export prefers num_date over numdate exported by augur refine #215

Open
huddlej opened this issue Sep 5, 2018 · 5 comments
Open
Labels
bug Something isn't working easy problem Requires less work than most issues priority: low To be resolved after high and moderate priority issues

Comments

@huddlej
Copy link
Contributor

huddlej commented Sep 5, 2018

The augur refine command exports branch length information including a numdate field that provides a likely estimate for each node's date.
The augur export command checks for both numdate and num_date fields in any "node data" JSONs or the metadata TSV and preferentially selects the num_date information.
In the case where the metadata TSV has a num_date field that was annotated by a previous step in the pipeline, the maximum likelihood numdate value from augur refine is omitted in favor of the metadata's value.

There are a couple issues to figure out:

  1. Can we pick one name for the date attribute and use it consistently across all datasets and code repositories?
  2. Can we determine a way to resolve conflicts between fields in data passed to augur export?

Even if the solution to the second part is to print an error and exit loudly, that would allow users to track down the issue with their data.

@huddlej huddlej added the bug Something isn't working label Sep 5, 2018
@tsibley
Copy link
Member

tsibley commented Sep 11, 2018

My 2¢:

  1. 👍 for doing this. I'd tend towards num_date if they're otherwise equal names.
  2. I don't think there's a good general resolution method where the behaviour would be obvious to an outside observer. 👍 for requiring that annotation files being joined in augur export have unique field names, and exiting with a descriptive, useful error message if they don't. (This is also the most parsimonious and easiest to implement option.)

@huddlej
Copy link
Contributor Author

huddlej commented Jan 14, 2019

This is related to #180 and duplicates that issues request to pick a standard name for numerical dates. This ticket adds the request of making augur export exit on name collisions.

@rneher
Copy link
Member

rneher commented Jul 7, 2019

treetime uses numdate internally. The parsing of dates from user generated tables is a constant head-ache. I think the main reason we prefer floating point dates over other date spec is that they are unambiguous to parse.

@huddlej
Copy link
Contributor Author

huddlej commented Mar 14, 2020

Revisiting this issue after the augur v6 updates to how auspice JSONs are exported and the original issue with conflicting key names in metadata and node JSONs remains.

The current implementation for merging metadata and node JSONs for augur v2 export explicitly overwrites the keys in the metadata. This can result in silently overwriting data that the user would expect to pass through to the final auspice JSONs.

Instead of overwriting overlapping keys, parse_node_data_and_metadata should raise an exception and the augur export v2 subcommand should catch that exception and print the exception message to standard error. This will alert the user to the conflict and allow them to resolve it explicitly.

@huddlej huddlej added easy problem Requires less work than most issues priority: low To be resolved after high and moderate priority issues labels Mar 14, 2020
@jstoja
Copy link

jstoja commented Apr 29, 2020

@huddlej Do I understand well that node_data shouldn't overwrite metadata keys?
From the tests (for example tb), it seems that node_data is more precise than metadata for the date field.

Raising an exception here means that the the export will fail since the function won't return. Maybe just printing a warning in the parse function will be enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working easy problem Requires less work than most issues priority: low To be resolved after high and moderate priority issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants