Skip to content

Commit

Permalink
updated README and docstring
Browse files Browse the repository at this point in the history
  • Loading branch information
zyxue committed Feb 5, 2021
1 parent fcf61d6 commit 72fc8e2
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 8 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ mkdir -p taxdump && tar zxf taxdump.tar.gz -C ./taxdump
Then, run ncbitax2lin

```bash
ncbitax2lin taxdump/nodes.dmp taxdump/names.dmp
ncbitax2lin --nodes-file taxdump/nodes.dmp --names-file taxdump/names.dmp
```

By default, the generated lineages will be saved to
Expand Down Expand Up @@ -62,3 +62,7 @@ of a different timestamp.
## Used in

* Mahmoudabadi, G., & Phillips, R. (2018). A comprehensive and quantitative exploration of thousands of viral genomes. ELife, 7. https://doi.org/10.7554/eLife.31955
* Dombrowski, N. et al. (2020) Undinarchaeota illuminate DPANN phylogeny and the impact of gene transfer on archaeal evolution, Nature Communications. Springer US, 11(1). doi: 10.1038/s41467-020-17408-w. https://www.nature.com/articles/s41467-020-17408-w
* Schenberger Santos, A. R. et al. (2020) NAD+ biosynthesis in bacteria is controlled by global carbon/ nitrogen levels via PII signaling, Journal of Biological Chemistry, 295(18), pp. 6165–6176. doi: 10.1074/jbc.RA120.012793. https://www.sciencedirect.com/science/article/pii/S0021925817482433
* Villada, J. C., Duran, M. F. and Lee, P. K. H. (2020) Interplay between Position-Dependent Codon Usage Bias and Hydrogen Bonding at the 5' End of ORFeomes, mSystems, 5(4), pp. 1–18. doi: 10.1128/msystems.00613-20. https://msystems.asm.org/content/5/4/e00613-20
* Byadgi, O. et al. (2020) Transcriptome analysis of amyloodinium ocellatum tomonts revealed basic information on the major potential virulence factors, Genes, 11(11), pp. 1–12. doi: 10.3390/genes11111252. https://www.mdpi.com/2073-4425/11/11/1252
14 changes: 7 additions & 7 deletions ncbitax2lin/ncbitax2lin.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ class TaxUnit(TypedDict):
rank_name: str


# the strings are tax_id, rank, rank_name
# A lineage is a list of (tax_id, rank, rank_name) tuples.
Lineage = NewType("Lineage", List[Tuple[int, str, str]])

# set TAXONOMY_DICT as global variable so it can work with multiprocess.Pool
Expand Down Expand Up @@ -105,17 +105,17 @@ def convert_lineage_to_dict(lineage: Lineage) -> Dict[str, Union[int, str]]:
"""Converts the lineage in a list-of-tuples represetantion to a dictionary representation
[
(tax_id1, rank1, name_txt1),
(tax_id2, rank2, name_txt2),
("tax_id1", "rank1", "name_txt1"),
("tax_id2", "rank2", "name_txt2"),
...
]
becomes
{
rank1: name_txt1,
rank2: name_txt2,
tax_id, tax_id2, # using the last rank as the tax_id of this lineage
"rank1": "name_txt1",
"rank2": "name_txt2",
"tax_id": "tax_id2", # using the last rank as the tax_id of this lineage
}
A concrete example:
Expand All @@ -129,8 +129,8 @@ def convert_lineage_to_dict(lineage: Lineage) -> Dict[str, Union[int, str]]:
{
'no rank': 'cellular organisms',
'superkingdom': 'Bacteria',
'tax_id': 2,
'superkingdom': 'Bacteria'
}
"""
Expand Down

0 comments on commit 72fc8e2

Please sign in to comment.