Update MarkerScan for compatibility with v2 of the NCBI API #15

prototaxites · 2025-11-06T11:58:36Z

The primary function of this PR is to update MarkerScan to be compatible with v2 of the NCBI API, replacing the dependency on ncbi-datasets-pylib which is deprecated.

To do this, I have added a small class in scripts/NcbiApi.py which replicates the required functionality using the requests library. In implementing this into the various scripts, I've also made a few changes to improve efficiency, including determining the number of assemblies directly from the API rather than counting them in a loop.

In order to run, currently you must set the environment variable NCBI_API_KEY in order to access the API. I don't know enough Snakemake to modify the Snakefile, but I've also added command-line arguments (-k) to the affected Python scripts to allow this to be specified manually.

In trying to get the modified version of the pipeline to run, I have also made a number of small changes:

Update minimap2 from 2.17 to 2.30 - 2.17 does not contain the map-hifi option used in the pipeline (are there an alternate set of env files somewhere?)
Update Seqtk from 1.3 to 1.5
Update BUSCO from 5.2.2 to 6.0.0 (necessary as the BUSCO database files have been updated)
Change the BUSCO config file so it doesn't try and set up the paths to the required dependencies, as this was causing problems (these are available as part of the conda env anyway)
Fix tabulate=0.8.0 in the Dockerfile for compatibity with the pinned version of Snakemake

- use lightweight custom class reimplementing required functionality with the NCBI API - tidy up some logic with counting loops by just returning counts from the API - use a bit of pathlib to do some path stuff - re-format modified scripts with Ruff

…ersion of snakemake

Update MarkerScan for compatibility with v2 of the NCBI API

muffato · 2025-11-06T14:33:09Z

Hi Jim.

An issue I had with the pipeline is that it downloads everything in its own space, essentially duplicating downloads we already have in /data/tol/resources. Is there a way we could pass existing download paths as a parameters and force it to use the data there ? (without attempting to update / replace anything).
Here are the three databases we already have:

busco/ (would need to add the --offline mode when running it).
taxonomy/
silva/

prototaxites · 2025-11-06T14:41:01Z

I am sure that would be doable - but I don't know any Snakemake or really understand very well the codebase here, so it would take a bit of effort to resolve

muffato

Use subprocess.check_call rather than os.system to check the return code
Commands that don't need wildcards, redirections, etc, should be defined as an array rather than a string

muffato · 2025-11-25T09:23:52Z

scripts/NCBIApiTools.py

+    def get_request(self, command: str) -> requests.Response:
+        """
+        Returns a requests response object for a given NCBI api query (command).
+
+        args:
+            command -> str: request to append to NCBI API URI
+        """
+        if self.ncbi_api_key != "":
+            headers = {"Accept": "application/json", "api-key": f"{self.ncbi_api_key}"}
+        else:
+            headers = {"Accept": "application/json"}
+
+        response = requests.get(self.ncbi_api_uri + command, headers=headers)
+
+        if response.status_code != 200:
+            raise Exception(
+                f"Cannot connect to NCBI (status code '{str(response.status_code)}')'"
+            )
+
+        return response


I'd suggest you copy https://github.com/sanger-tol/blobtoolkit/blob/0.9.0/bin/generate_config.py#L34-L42 into here, so that you can recover from transient network errors. (maybe adapt it for the POST request in download_genomes)

prototaxites and others added 14 commits October 8, 2025 10:10

fix: properly exit when download fails

fbc1ab6

fix: minor change to file to trigger ci

cb7b5ba

fix(dockerfile): force tabulate 0.8.10 for compatibility with fixed v…

73d55cc

…ersion of snakemake

fix(ncbi): run get on json() object, not requests response directly

e2db290

fix(ncbi): don't fail on nested gets

1b841a9

fix(ncbi): don't fail on nested gets

a7429a7

fix(ncbi): get date correctly

6ea527d

fix(ncbi): correct path commands

29d95c6

fix(envs): bump minimap version to 2.30

9c5b1c7

fix(envs): bump seqtk version to 1.5

a9dbb45

fix(envs): bump busco to 6.0.0

cd8dada

fix(busco_config): don't try and guess the paths

e83f860

Merge pull request #14 from prototaxites/main

fcd6f34

Update MarkerScan for compatibility with v2 of the NCBI API

muffato reviewed Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update MarkerScan for compatibility with v2 of the NCBI API #15

Update MarkerScan for compatibility with v2 of the NCBI API #15

Uh oh!

prototaxites commented Nov 6, 2025

Uh oh!

muffato commented Nov 6, 2025

Uh oh!

prototaxites commented Nov 6, 2025

Uh oh!

muffato left a comment

Uh oh!

muffato Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update MarkerScan for compatibility with v2 of the NCBI API #15

Are you sure you want to change the base?

Update MarkerScan for compatibility with v2 of the NCBI API #15

Uh oh!

Conversation

prototaxites commented Nov 6, 2025

Uh oh!

muffato commented Nov 6, 2025

Uh oh!

prototaxites commented Nov 6, 2025

Uh oh!

muffato left a comment

Choose a reason for hiding this comment

Uh oh!

muffato Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants