Skip to content

Conversation

@prototaxites
Copy link
Collaborator

@prototaxites prototaxites commented Oct 16, 2025

The primary function of this PR is to update MarkerScan to be compatible with v2 of the NCBI API, replacing the dependency on ncbi-datasets-pylib which is deprecated.

To do this, I have added a small class in scripts/NcbiApi.py which replicates the required functionality using the requests library. In implementing this into the various scripts, I've also made a few changes to improve efficiency, including determining the number of assemblies directly from the API rather than counting them in a loop.

In order to run, currently you must set the environment variable NCBI_API_KEY in order to access the API. I don't know enough Snakemake to modify the Snakefile, but I've also added command-line arguments (-k) to the affected Python scripts to allow this to be specified manually.

In trying to get the modified version of the pipeline to run, I have also made a number of small changes:

  • Update minimap2 from 2.17 to 2.30 - 2.17 does not contain the map-hifi option used in the pipeline (are there an alternate set of env files somewhere?)
  • Update Seqtk from 1.3 to 1.5
  • Update BUSCO from 5.2.2 to 6.0.0 (necessary as the BUSCO database files have been updated)
  • Change the BUSCO config file so it doesn't try and set up the paths to the required dependencies, as this was causing problems (these are available as part of the conda env anyway)
  • Fix tabulate=0.8.0 in the Dockerfile for compatibity with the pinned version of Snakemake

@prototaxites prototaxites changed the base branch from main to ncbi_fix November 6, 2025 11:57
@prototaxites prototaxites merged commit fcd6f34 into CobiontID:ncbi_fix Nov 6, 2025
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant