Taxadb2 is an application to locally query the ncbi taxonomy. Taxadb2 is written in python, and access its database using the peewee library.
Taxadb2 is a fork from https://github.com/HadrienG/taxadb and handles the merged.dmp
ncbi taxonomy file to deal with updated taxIDs.
- the built-in support for MySQL and PostgreSQL was not touched and kept as it is
merged.dmp
support was added
In brief Taxadb2:
- is a small tool to query the ncbi taxonomy.
- is written in python >= 3.10.
- has built-in support for SQLite, MySQL and PostgreSQL.
- has available pre-built SQLite databases.
- has a comprehensive API documentation.
Taxadb2 requires python >= 3.10 to work. To install taxadb2 with sqlite support, simply type the following in your terminal:
pip3 install taxadb2
If you wish to use MySQL or PostgreSQL, please refer to the full documentation
Firstly, make sure you have built the database
Below you can find basic examples. For more complete examples, please refer to the complete API documentation
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> dbname = "taxadb2/test/test_db.sqlite"
>>> ncbi = {
>>> 'taxid': TaxID(dbtype='sqlite', dbname=dbname),
>>> 'names': SciName(dbtype='sqlite', dbname=dbname),
>>> 'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
>>> }
>>> taxid2name = ncbi['taxid'].sci_name(2)
>>> print(taxid2name)
Bacteria
>>> lineage = ncbi['taxid'].lineage_name(17)
>>> print(lineage[:5])
['Methylophilus methylotrophus', 'Methylophilus', 'Methylophilaceae', 'Nitrosomonadales', 'Betaproteobacteria']
>>> lineage = ncbi['taxid'].lineage_name(17, reverse=True)
>>> print(lineage[:5])
['cellular organisms', 'Bacteria', 'Pseudomonadati', 'Pseudomonadota', 'Betaproteobacteria']
>>> ncbi['taxid'].has_parent(17, 'Bacteria')
True
Get the taxid from a scientific name.
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> dbname = "taxadb2/test/test_db.sqlite"
>>> ncbi = {
>>> 'taxid': TaxID(dbtype='sqlite', dbname=dbname),
>>> 'names': SciName(dbtype='sqlite', dbname=dbname),
>>> 'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
>>> }
>>> name2taxid = ncbi['names'].taxid('Pseudomonadota')
>>> print(name2taxid)
1224
Automatic detection of old
taxIDs imported from merged.dmp
.
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> dbname = "taxadb2/test/test_db.sqlite"
>>> ncbi = {
>>> 'taxid': TaxID(dbtype='sqlite', dbname=dbname),
>>> 'names': SciName(dbtype='sqlite', dbname=dbname),
>>> 'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
>>> }
>>> taxid2name = ncbi['taxid'].sci_name(30)
TaxID 30 is deprecated, using 29 instead.
>>> print(taxid2name)
Myxococcales
Get the taxonomic information for accession number(s).
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> dbname = "taxadb2/test/test_db.sqlite"
>>> ncbi = {
>>> 'taxid': TaxID(dbtype='sqlite', dbname=dbname),
>>> 'names': SciName(dbtype='sqlite', dbname=dbname),
>>> 'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
>>> }
>>> my_accessions = ['A01460']
>>> taxids = ncbi['accessionid'].taxid(my_accessions)
>>> taxids
<generator object AccessionID.taxid at 0x103e21bd0>
>>> for ti in taxids:
print(ti)
('A01460', 17)
You can also use a configuration file in order to automatically set database connection parameters at object build. Either set config parameter to init object method:
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> config_path = "taxadb2/test/taxadb2.cfg"
>>> ncbi = {
>>> 'taxid': TaxID(config=config_path),
>>> 'names': SciName(config=config_path),
>>> 'accessionid': AccessionID(config=config_path)
>>> }
>>> ncbi['taxid'].sci_name(2)
Bacteria
>>> ...
or set environment variable TAXADB_CONFIG which point to configuration file:
$ export TAXADB2_CONFIG='taxadb2/test/taxadb2.cfg'
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> ncbi = {
>>> 'taxid': TaxID(),
>>> 'names': SciName(),
>>> 'accessionid': AccessionID()
>>> }
>>> ncbi['taxid'].sci_name(2)
Bacteria
>>> ...
Check documentation for more information.
The following commands will download the necessary files from the ncbi ftp into the directory taxadb
.
$ taxadb2 download --outdir taxadb --type taxa
$ taxadb2 create --division taxa --input taxadb --dbname taxadb.sqlite
You can then safely remove the downloaded files
$ rm -r taxadb
You can easily rerun the same command, taxadb2
is able to skip already inserted taxid
as well as accession
.
Note: Relies on the pytest
module. pip install pytest
You can easily run some tests. Go to the root directory of this projects cd /path/to/taxadb2
and run
pytest -v
.
This simple command will run tests against an SQLite
test database called test_db.sqlite
located in taxadb2/test
directory.
It is also possible to only run tests related to accessionid or taxid as follow
$ pytest -m 'taxid'
$ pytest -m 'accessionid'
You can also use the configuration file located in root distribution taxadb2.ini
as follow. This file should contain
database connection settings:
$ pytest taxadb2/test --config='taxadb2.ini'
Code is under the MIT license.
Found a bug or have a question? Please open an issue
Thought about a new feature that you'd like us to implement? Open an issue or fork the repository and submit a pull request
This repository adhere to Contributor Covenant code of conduct for in any interactions you have within this project. (see Code of Conduct)
See also the policy against sexualized discrimination, harassment and violence for the Max Planck Society Code-of-Conduct.
By contributing to this project, you agree to abide by its terms.