Releases: gbouras13/hybracter
v0.10.0
- Updates Medaka to v2.0.1, implementing the
--bacteria
option by default. - This is based on the recommendations of Ryan Wick here who found it improved assemblies due to (likely) enhanced methylation error correction.
- If you still want to specify a Medaka model, the flag
--medaka_override
has been added. You need to include this along with your model via--medakaModel
. This is most likely useful for older R9 data. - Adds
--extra_params_flye
parameter if you want to specify extra commands for the Flye assembly step thanks @pdobbler #101
v0.9.1
- Small change to the
plassembler.yaml
config and plassembler rules preventing installation bugs - Unicycler v0.5.1 to be installed in a much simpler fashion via Bioconda thanks @npbhavya. Installation should be a lot less fragile now - The crappy workaround was because Unicycler conda package for MacOS was not built/broken for v0.5.0 - thanks @mencian @tcezard for fixing v0.5.1 bioconda/bioconda-recipes#49602
v0.9.0
--auto
for automatic estimation of chromosome size
-
Thanks to an issue and code from @richardstoeckl, Hybracter can now estimate the estimated chromosome size for each sample by passing
--auto
. -
The implementation uses kmc. Specifically, Hybracter uses kmc to count the number of unique 21mers that appear at least 10 times in your long-read FASTQ file. This is because, for a given assembly of length L, and a k-mer size of k, the total number of unique possible k-mers will be given by ( L – k ) + 1, and if L >> k, then it suffices as an estimate of total assembly size
-
The estimated chromosome size used by Hybracter will actually be 80% of the number of 21-mers found at least 10 times, as it needs to account for plasmids
-
If you aren't sure whether you have enough data for assembly (i.e. coverage lower than 20x), be careful using
--auto
, because the actual assembly size will tend to be larger than the number of unique 21mers found at least 10 times. Therefore, the estimated chromosome size will almost certainly be an underestimate and may lead to Hybracter considering your assembly "complete" when in fact it isn't. -
If you use
--auto
, you do not need to specify the chromosome length in the input. This means you don't need to-c
withlong-single
orhybrid-single
and in the input csv sample sheet, you do not need a column with chromosome length.
e.g. for hybracter long
you only need 2 columns with sample name and long-read FASTQ file path:
s_aureus_sample1,sample1_long_read.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz
and for hybracter hybrid
you only need 4 columns with sample name, long-read FASTQ, and R1 and R2 short-read FASTQ file paths:
s_aureus_sample1,sample1_long_read.fastq.gz,sample1_SR_R1.fastq.gz,sample1_SR_R2.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz,sample2_SR_R1.fastq.gz,sample2_SR_R2.fastq.gz
Other changes
- Hybracter v0.9.0 will automatically support the reorientation of archaeal chromosomes (thanks @richardstoeckl) to begin with the cog1474 Orc1/cdc6 gene.
--datadir
can now also accept 2 paths separated by a comma, if you have long reads and short reads in separate directories e.g.--datadir "long_read_dir,short_read_dir"
(#76).--min_depth
parameter added. Hybracter will error out if your QC'd long reads have a coverage lower thanmin_depth
for a sample (#89).
v0.8.0
- Adds
--datadir
that removes the need to add full paths in sample sheet (thanks @oschwengers) - Update medaka to v1.12.1 to support the newest models (#84 )
- New default medaka model is
r1041_e82_400bps_sup_v5.0.0
- New default medaka model is
- Adds
--mac
flag if you are running Hybracter on MacOS - it is now recommended from to run Hybracter on Linux if you want the latest Medaka models.- This is because ONT do not support bioconda install anymore and the latest version (v1.12.1) from pip doesn't work on Mac
--mac
will install and run Medaka v1.8.0 as in previous versions and user1041_e82_400bps_sup_v4.2.0
as default
0.7.3
- Enforce spades>=v3.15.2 in the
plassembler.yaml
environment - For some reason, the environment on Linux environments was being solved for v3.14.1, which was causing an error with Unicycler within Plassembler for some samples described (rrwick/Unicycler#318)
v0.7.2
v0.7.1
v0.7.0
Bug fixes
- Fixes bug where
--configfile
wasn't being passed to Hybracter. - Fixes bug where
hybracter
would crash if the input long reads were not gzipped #51 thanks @wanyuac.
Changes to short read polishing.
- Logic added to run
polypolish
v0.6.0 with--careful
and skippypolca
if the SR coverage estimate is below 5x (note: FASTA files forpypolca
will be generated in the processing directory to play nice with Snakemake, but these will be identical to thepolypolish
output). - For 5-25x coverage,
polypolish --careful
andpypolca
with--careful
will be run. - For >25x coverage,
polypolish
default andpypolca
with--careful
will be run - this should be the most common case. - A preprint justifying these changes will be available soon.
--logic
changes
- By default,
--logic
now defaults tolast
forhybracter hybrid
, as there we have found that the polishing strategy implemented above never makes the assembly worse. We suggest never using--logic best
withhybracter hybrid
.
Changes for chromosome contigs and circularity.
- If hybracter assembles a contig that is greater than the minimum chromosome length but not marked as circular by Flye, this will now be denoted as a chromosome, but not circular. The genome will be marked as complete also.
- These will usually be assemblies with some issue (e.g. prophages, circularisation issues, heterogeneity) and probably require some more attention.
- For example, with the Vibrio cholerae larger chromosome described here, the genome will be marked as 'complete' but the contig will not be marked as 'circular' in the
hybracter
output. - Such contigs will be polished and be in the final
_chromosome.fasta
output, but they will not be rotated bydnaapler
. - These were previously being excluded, which was missing assemblies with structural heterogeneity (causing the chromosome not to completely circularise) or even bacteria with linear chromosomes like Borrelia.
Adds --depth_filter
- This is passed to Plassembler and will filter out all putative plasmid contigs that are lower than this depth fraction compared to the chromosome.
- Defaults to 0.25 like Unicycler's implementation.
v0.6.0
- Fixes bug with Polypolish v0.6.0 breaking the CLI #49 thanks @wanyuac
- Adds -m option to download all Medaka models with hybracter install - useful for offline use #48 @lxsteiner
- Adds quick SR coverage estimates (in
processing/qc/coverage
) and other QC stats (using seqkit ) inprocessing/qc/seqkit
. This is calculated as (Total bases / estimated chromosome size) for each sample - Logic added to run Polypolish and pypolca with
--careful
if the SR coverage estimate is below 25x.
v0.5.0
Ryan Wick recently ran hybracter long
on the latest Dorado v0.5.0 Nanopore reads from his blog post.
You can read a write-up of the results here.
Added Features in v0.5.0
- Adds subsampling using
--subsample_depth
using Filtlong, based on some benchmarking of Dorado v0.5.0 reads. Defaults to 100 i.e. 100x of the estimated chromosome size-c
. - Adds stricter criteria for complete assemblies (aka ensures that identified chromosomes must be circularised according to Flye). Thanks to Matthew Croxen for pointing this out.