-
Notifications
You must be signed in to change notification settings - Fork 2
Update MarkerScan for compatibility with v2 of the NCBI API #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- use lightweight custom class reimplementing required functionality with the NCBI API - tidy up some logic with counting loops by just returning counts from the API - use a bit of pathlib to do some path stuff - re-format modified scripts with Ruff
…ersion of snakemake
Update MarkerScan for compatibility with v2 of the NCBI API
|
Hi Jim. An issue I had with the pipeline is that it downloads everything in its own space, essentially duplicating downloads we already have in
|
|
I am sure that would be doable - but I don't know any Snakemake or really understand very well the codebase here, so it would take a bit of effort to resolve |
muffato
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Use
subprocess.check_callrather thanos.systemto check the return code - Commands that don't need wildcards, redirections, etc, should be defined as an array rather than a string
| def get_request(self, command: str) -> requests.Response: | ||
| """ | ||
| Returns a requests response object for a given NCBI api query (command). | ||
|
|
||
| args: | ||
| command -> str: request to append to NCBI API URI | ||
| """ | ||
| if self.ncbi_api_key != "": | ||
| headers = {"Accept": "application/json", "api-key": f"{self.ncbi_api_key}"} | ||
| else: | ||
| headers = {"Accept": "application/json"} | ||
|
|
||
| response = requests.get(self.ncbi_api_uri + command, headers=headers) | ||
|
|
||
| if response.status_code != 200: | ||
| raise Exception( | ||
| f"Cannot connect to NCBI (status code '{str(response.status_code)}')'" | ||
| ) | ||
|
|
||
| return response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest you copy https://github.com/sanger-tol/blobtoolkit/blob/0.9.0/bin/generate_config.py#L34-L42 into here, so that you can recover from transient network errors. (maybe adapt it for the POST request in download_genomes)
The primary function of this PR is to update MarkerScan to be compatible with v2 of the NCBI API, replacing the dependency on
ncbi-datasets-pylibwhich is deprecated.To do this, I have added a small class in
scripts/NcbiApi.pywhich replicates the required functionality using therequestslibrary. In implementing this into the various scripts, I've also made a few changes to improve efficiency, including determining the number of assemblies directly from the API rather than counting them in a loop.In order to run, currently you must set the environment variable NCBI_API_KEY in order to access the API. I don't know enough Snakemake to modify the Snakefile, but I've also added command-line arguments (
-k) to the affected Python scripts to allow this to be specified manually.In trying to get the modified version of the pipeline to run, I have also made a number of small changes:
map-hifioption used in the pipeline (are there an alternate set of env files somewhere?)