The Next Million Names for Archaea and Bacteria
Mark J. Pallen et al. The Next Million Names for Archaea and Bacteria, Trends in Microbiology (2020). DOI: 10.1016/j.tim.2020.10.009
To generate a large number of new names, we apply a combinatorial approach starting with two or three sets of curated roots, that are processed to produce all their possible combinations while keeping trace of their grammatical metadata to draft a valid etymology.
The scripts in this repository require Python (at least 3.6) and these modules:
- itertools (ships with Python)
- pandas (>1.0)
- xlrd (1.2.0)
To run the scripts of this repository, we suggest to create a conda environment as follows:
conda create -c conda-forge -n gan python=3.8 pandas pip ipython
conda activate gan
pip install xlrd==1.2.0
A set of two (or three) Excel tables formatted as shown below is used to generate the list of combinations in JSON, HTML and LaTeX format.
Synopsis:
usage: gan-genus.py [-h] -1 FIRST -2 SECOND [-3 THIRD] -o OUTDIR [-p PREFIX] [-c CONNECTOR] [-v]
For full usage and installation instructions, please check the documentation.
Using three small files in the input_test directory (8, 11 and 8 words, respectively), GAN produced 968 (8 x 11 x 8)combinations:
- in PDF format
- in HTML format
"The great automatic nomenclaturer" is a reference to a short story ("The Great Automatic Grammatizator") written by the British author Roald Dahl [link].