Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

key names in branch_lengths JSON #180

Closed
jameshadfield opened this issue Jul 18, 2018 · 5 comments
Closed

key names in branch_lengths JSON #180

jameshadfield opened this issue Jul 18, 2018 · 5 comments
Assignees

Comments

@jameshadfield
Copy link
Member

The current branch lengths JSON (from augur refine) contains keys branch_length, clock_length and mutation_length. It's not clear that branch_length is measured in units of time, and appears to be always equal to clock_length. Can we remove the former and use the latter downstream? (Or rename the former if it's somehow different)

It also refers to numdate and num_date_confidence. Can the former be made num_date for consistency? (I believe this JSON is only consumed by augur export)

Separately, raw_date differs-by-one from date (e.g. 2016-04-04 vs 2016-04-05) - is this a bug?

@rneher
Copy link
Member

rneher commented Jul 18, 2018

Emma and I just looked at this. the different branch length can all go. branch_length is pointing either at clock_length or at mutation_length inside treetime. We don't really need to export them since we are exporting num_date and div and those are the only quantities auspice uses. the who might differ though.

treetime internally uses numdate, but augur at some point migrated to num_date which is sort of confusing.

the raw_date divergernce from date is due to rounding/gap years etc and happens when treetime is converting the numdate back to calendar date. looks like in most cases it wants to be one day earlier... we can change that in treetime. But if we have a fully specified date as input, that should take precedence.

@trvrb
Copy link
Member

trvrb commented Jul 18, 2018

I think good to have tree refine produce a node data JSON that contains derived, useful quantities. TreeTime may only use div, but I can totally imagine writing a script that wants to use mutation_length of a node as its input. I would vote to get rid of export of branch_length and keep mutation_length and clock_length in the resulting node data JSON.

If this JSON is going to have raw_date, then it really should be num_date in the JSON.

@jameshadfield
Copy link
Member Author

jameshadfield commented Sep 25, 2018

Notes from working on BEAST parsing (which uses augur export but not augur refine). I'll update this message as I learn more.

  • The properties branch_length, numdate and clock_length are needed in the node JSON for export to work properly.
    • branch_length is needed, but not exported into the auspice tree JSON.
    • mutation_length is not needed
  • div is set by export at lines https://github.com/nextstrain/augur/blob/master/augur/export.py#L52 using either mutation_length or branch_length, so i'm relatively confident we only need one of them (mutation_length implies divergence to me, so that'd get my vote)

@jameshadfield
Copy link
Member Author

@emmahodcroft this commit (branch beast) 71f39a8 now allows trees to be exported without the div property, which is useful for BEAST trees.

@emmahodcroft
Copy link
Member

Is this now superseded by #316, and if so, can we close?

@rneher rneher closed this as completed Sep 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants