Skip to content

Releases: gbouras13/hybracter

v0.10.0

18 Oct 08:16
9bf944a
Compare
Choose a tag to compare
  • Updates Medaka to v2.0.1, implementing the --bacteria option by default.
  • This is based on the recommendations of Ryan Wick here who found it improved assemblies due to (likely) enhanced methylation error correction.
  • If you still want to specify a Medaka model, the flag --medaka_override has been added. You need to include this along with your model via --medakaModel. This is most likely useful for older R9 data.
  • Adds --extra_params_flye parameter if you want to specify extra commands for the Flye assembly step thanks @pdobbler #101

v0.9.1

07 Oct 23:02
abf66df
Compare
Choose a tag to compare
  • Small change to the plassembler.yaml config and plassembler rules preventing installation bugs - Unicycler v0.5.1 to be installed in a much simpler fashion via Bioconda thanks @npbhavya. Installation should be a lot less fragile now
  • The crappy workaround was because Unicycler conda package for MacOS was not built/broken for v0.5.0 - thanks @mencian @tcezard for fixing v0.5.1 bioconda/bioconda-recipes#49602

v0.9.0

18 Sep 01:28
3bbae0b
Compare
Choose a tag to compare

--auto for automatic estimation of chromosome size

  • Thanks to an issue and code from @richardstoeckl, Hybracter can now estimate the estimated chromosome size for each sample by passing --auto.

  • The implementation uses kmc. Specifically, Hybracter uses kmc to count the number of unique 21mers that appear at least 10 times in your long-read FASTQ file. This is because, for a given assembly of length L, and a k-mer size of k, the total number of unique possible k-mers will be given by ( L – k ) + 1, and if L >> k, then it suffices as an estimate of total assembly size

  • The estimated chromosome size used by Hybracter will actually be 80% of the number of 21-mers found at least 10 times, as it needs to account for plasmids

  • If you aren't sure whether you have enough data for assembly (i.e. coverage lower than 20x), be careful using --auto, because the actual assembly size will tend to be larger than the number of unique 21mers found at least 10 times. Therefore, the estimated chromosome size will almost certainly be an underestimate and may lead to Hybracter considering your assembly "complete" when in fact it isn't.

  • If you use --auto, you do not need to specify the chromosome length in the input. This means you don't need to -c with long-single or hybrid-single and in the input csv sample sheet, you do not need a column with chromosome length.

e.g. for hybracter long you only need 2 columns with sample name and long-read FASTQ file path:

s_aureus_sample1,sample1_long_read.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz

and for hybracter hybrid you only need 4 columns with sample name, long-read FASTQ, and R1 and R2 short-read FASTQ file paths:

s_aureus_sample1,sample1_long_read.fastq.gz,sample1_SR_R1.fastq.gz,sample1_SR_R2.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz,sample2_SR_R1.fastq.gz,sample2_SR_R2.fastq.gz

Other changes

  • Hybracter v0.9.0 will automatically support the reorientation of archaeal chromosomes (thanks @richardstoeckl) to begin with the cog1474 Orc1/cdc6 gene.
  • --datadir can now also accept 2 paths separated by a comma, if you have long reads and short reads in separate directories e.g. --datadir "long_read_dir,short_read_dir" (#76).
  • --min_depth parameter added. Hybracter will error out if your QC'd long reads have a coverage lower than min_depth for a sample (#89).

v0.8.0

03 Sep 06:55
14d35b3
Compare
Choose a tag to compare
  • Adds --datadir that removes the need to add full paths in sample sheet (thanks @oschwengers)
  • Update medaka to v1.12.1 to support the newest models (#84 )
    • New default medaka model is r1041_e82_400bps_sup_v5.0.0
  • Adds --mac flag if you are running Hybracter on MacOS - it is now recommended from to run Hybracter on Linux if you want the latest Medaka models.
    • This is because ONT do not support bioconda install anymore and the latest version (v1.12.1) from pip doesn't work on Mac
    • --mac will install and run Medaka v1.8.0 as in previous versions and use r1041_e82_400bps_sup_v4.2.0 as default

0.7.3

04 Apr 21:54
cbfbbe1
Compare
Choose a tag to compare
  • Enforce spades>=v3.15.2 in the plassembler.yaml environment
  • For some reason, the environment on Linux environments was being solved for v3.14.1, which was causing an error with Unicycler within Plassembler for some samples described (rrwick/Unicycler#318)

v0.7.2

02 Apr 05:55
3706a74
Compare
Choose a tag to compare
  • Adds 'circular=True' to chromosome contig headers where Flye has marked these as such. A bug was introduced in v0.7.0.
  • Thanks Nicole Lerminiaux for spotting this

v0.7.1

13 Mar 05:46
fd423f0
Compare
Choose a tag to compare
  • Fixes bug #60 where hybracter install -d db_dir would not work as the -f parameter was not being passed to Plassembler. Thanks @npbhavya

v0.7.0

04 Mar 05:47
1b0cdc9
Compare
Choose a tag to compare

Bug fixes

  • Fixes bug where --configfile wasn't being passed to Hybracter.
  • Fixes bug where hybracter would crash if the input long reads were not gzipped #51 thanks @wanyuac.

Changes to short read polishing.

  • Logic added to run polypolish v0.6.0 with --careful and skip pypolca if the SR coverage estimate is below 5x (note: FASTA files for pypolca will be generated in the processing directory to play nice with Snakemake, but these will be identical to the polypolish output).
  • For 5-25x coverage, polypolish --careful and pypolca with --careful will be run.
  • For >25x coverage, polypolish default and pypolca with --careful will be run - this should be the most common case.
  • A preprint justifying these changes will be available soon.

--logic changes

  • By default, --logic now defaults to last for hybracter hybrid, as there we have found that the polishing strategy implemented above never makes the assembly worse. We suggest never using --logic best with hybracter hybrid.

Changes for chromosome contigs and circularity.

  • If hybracter assembles a contig that is greater than the minimum chromosome length but not marked as circular by Flye, this will now be denoted as a chromosome, but not circular. The genome will be marked as complete also.
    • These will usually be assemblies with some issue (e.g. prophages, circularisation issues, heterogeneity) and probably require some more attention.
    • For example, with the Vibrio cholerae larger chromosome described here, the genome will be marked as 'complete' but the contig will not be marked as 'circular' in the hybracter output.
    • Such contigs will be polished and be in the final _chromosome.fasta output, but they will not be rotated by dnaapler.
    • These were previously being excluded, which was missing assemblies with structural heterogeneity (causing the chromosome not to completely circularise) or even bacteria with linear chromosomes like Borrelia.

Adds --depth_filter

  • This is passed to Plassembler and will filter out all putative plasmid contigs that are lower than this depth fraction compared to the chromosome.
  • Defaults to 0.25 like Unicycler's implementation.

v0.6.0

18 Jan 09:19
f6b8e7f
Compare
Choose a tag to compare
  • Fixes bug with Polypolish v0.6.0 breaking the CLI #49 thanks @wanyuac
  • Adds -m option to download all Medaka models with hybracter install - useful for offline use #48 @lxsteiner
  • Adds quick SR coverage estimates (in processing/qc/coverage) and other QC stats (using seqkit ) inprocessing/qc/seqkit. This is calculated as (Total bases / estimated chromosome size) for each sample
  • Logic added to run Polypolish and pypolca with --careful if the SR coverage estimate is below 25x.

v0.5.0

09 Jan 03:25
fdfba14
Compare
Choose a tag to compare

Ryan Wick recently ran hybracter long on the latest Dorado v0.5.0 Nanopore reads from his blog post.

You can read a write-up of the results here.

Added Features in v0.5.0

  • Adds subsampling using --subsample_depth using Filtlong, based on some benchmarking of Dorado v0.5.0 reads. Defaults to 100 i.e. 100x of the estimated chromosome size -c.
  • Adds stricter criteria for complete assemblies (aka ensures that identified chromosomes must be circularised according to Flye). Thanks to Matthew Croxen for pointing this out.