Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should database overwrites be requested by the user? or disabled by user? #16

Closed
rpetit3 opened this issue Sep 13, 2019 · 6 comments
Closed

Comments

@rpetit3
Copy link

rpetit3 commented Sep 13, 2019

Really great tool @evolarjun! I'm curious to get your thoughts on handling existing databases.

Currently if a database already exists it is overwritten then downloaded again.

amrfinder -u
Running amrfinder -u
Running /home/rpetit/miniconda3/envs/amr/bin/amrfinder_update -d /home/rpetit/miniconda3/envs/amr/share/amrfinderplus/data
/home/rpetit/miniconda3/envs/amr/share/amrfinderplus/data/2019-09-06.1/ already exists, overwriting what was there
Dowloading AMRFinder database version 2019-09-06.1 into /home/rpetit/miniconda3/envs/amr/share/amrfinderplus/data/2019-09-06.1/
Indexing

I wonder if it would be better to raise an error and require the user to explicitly tell amrfinder (ex --force) to overwrite the existing database.

An alternative, would be allow user to disable the overwrite (ex. --skip_existing).

@evolarjun
Copy link
Contributor

Hi @rpetit3,

Thanks for the compliment! Keep 'em coming :-)

We have it automatically overwriting to simplify things for users (and our support) if/when the database gets moved or corrupted, but we're open to reconsidering and/or adding an option. What is your concern about the way it currently overwrites the database?

@rpetit3
Copy link
Author

rpetit3 commented Sep 16, 2019

I think this is a fringe case, but I'm using amrfinder within a conda environment create by Nextflow (https://www.nextflow.io/docs/latest/conda.html).

When the workflow is run, if a conda environment doesn't exist, Nextflow will create it. After creation, amrfinder requires the database to be updated. So, in Nextflow I check to make sure the database exists, and if it doesn't go ahead and update. Occasionally I get instances where multiple jobs try to update at the same time each overwriting one another. I fixed this by updating earlier (a single time) in the workflow.

Probably wasn't the best idea to have multiple jobs trying to update at once, but it got me to wondering: Why update (and overwrite) to the same version of a database? This led to the --force or --skip_existing suggestion.

@evolarjun
Copy link
Contributor

Thanks @rpetit3, That's not something we had thought about since we update so infrequently. I'll talk it over with the rest of the team and see what we can do. I'm glad you've been able to fix the issue short-term and we'll discuss how we can fix it with AMRFinderPlus in the future.

@vbrover
Copy link
Contributor

vbrover commented Jan 15, 2020

I suggest having 2 workflows:

  • one for the installation of AMRFinder and the database - to be run once;
  • and the other one to run AMRFinder on your sequences - can be run in parallel.

evolarjun added a commit that referenced this issue Oct 6, 2020
Release 3.8.28

 - amrfinder -u option no longer overwrites databases if they already exist.
    New --force_update option will still overwrite the database as the old -u
    option did. ([#16])
 - New --protein_output and --nucleotide_output options will produce FASTA files
    containing the sequence of AMRFinderPlus hits
 - --mutation_all output has been streamlined and some bugs have been fixed
 - New --name option will prepend a "Name" field to every row of the
    AMRFinderPlus report ([#25])
 - Fusion genes are now reported with both element symbols separated by a '/' on
    both lines of the report. E.g., aac(6')-Ie/aph(2'')-If2
 - For HMM-only hits (have HMM in the method column which means they do not have
    BLAST results that meet cutoff) BLAST statistics will now be reported
    if any alignment at that location was made.
 - When in COMBINED (nucleotide + protein) mode and when an "INTERNAL_STOP" is detected
    at a locus that also has a protein, the protein result will be reported with
    Method of INTERNAL_STOP (Previously the nucleotide result was reported. This
    would affect the identity, length, and coverage statistics AMRFinderPlus reports.
 - Improved handling of partial alignments for point-mutation detection, including
    prioritization of translated matches (POINTX) when the protein match identifies fewer
    known SNPs.
 - Incorrect amino acids inferred by protein annotations extended in the 5' (N-terminal)
    direction because correct annotations start with alternative start codons will
    no longer count as mismatches.
 - Small improvements in performance.
@evolarjun
Copy link
Contributor

It only took us a year :-), but we finally implemented your suggestion @rpetit3 in release 3.8.28. The -u/--update option now will not overwrite the database by default and we have a new --force_update parameter to force the overwrite. I know you've already solved your immediate problem, but hopefully this makes it easier for people to add an amrfinder -u so they'll be using the latest database, and race conditions when doing that should be much less likely.

Thanks for your suggestion!

@rpetit3
Copy link
Author

rpetit3 commented Oct 7, 2020

Thank you very much! This will work great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants