18 Oct 08:16

gbouras13

v0.10.0

9bf944a

v0.10.0 Latest

Latest

Updates Medaka to v2.0.1, implementing the --bacteria option by default.
This is based on the recommendations of Ryan Wick here who found it improved assemblies due to (likely) enhanced methylation error correction.
If you still want to specify a Medaka model, the flag --medaka_override has been added. You need to include this along with your model via --medakaModel. This is most likely useful for older R9 data.
Adds --extra_params_flye parameter if you want to specify extra commands for the Flye assembly step thanks @pdobbler #101

Contributors

pdobbler

Assets 2

07 Oct 23:02

gbouras13

v0.9.1

abf66df

v0.9.1

Small change to the plassembler.yaml config and plassembler rules preventing installation bugs - Unicycler v0.5.1 to be installed in a much simpler fashion via Bioconda thanks @npbhavya. Installation should be a lot less fragile now
The crappy workaround was because Unicycler conda package for MacOS was not built/broken for v0.5.0 - thanks @mencian @tcezard for fixing v0.5.1 bioconda/bioconda-recipes#49602

Contributors

tcezard, npbhavya, and mencian

Assets 2

18 Sep 01:28

gbouras13

v0.9.0

3bbae0b

v0.9.0

--auto for automatic estimation of chromosome size

Thanks to an issue and code from @richardstoeckl, Hybracter can now estimate the estimated chromosome size for each sample by passing --auto.
The implementation uses kmc. Specifically, Hybracter uses kmc to count the number of unique 21mers that appear at least 10 times in your long-read FASTQ file. This is because, for a given assembly of length L, and a k-mer size of k, the total number of unique possible k-mers will be given by ( L – k ) + 1, and if L >> k, then it suffices as an estimate of total assembly size
The estimated chromosome size used by Hybracter will actually be 80% of the number of 21-mers found at least 10 times, as it needs to account for plasmids
If you aren't sure whether you have enough data for assembly (i.e. coverage lower than 20x), be careful using --auto, because the actual assembly size will tend to be larger than the number of unique 21mers found at least 10 times. Therefore, the estimated chromosome size will almost certainly be an underestimate and may lead to Hybracter considering your assembly "complete" when in fact it isn't.
If you use --auto, you do not need to specify the chromosome length in the input. This means you don't need to -c with long-single or hybrid-single and in the input csv sample sheet, you do not need a column with chromosome length.

e.g. for hybracter long you only need 2 columns with sample name and long-read FASTQ file path:

s_aureus_sample1,sample1_long_read.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz

and for hybracter hybrid you only need 4 columns with sample name, long-read FASTQ, and R1 and R2 short-read FASTQ file paths:

s_aureus_sample1,sample1_long_read.fastq.gz,sample1_SR_R1.fastq.gz,sample1_SR_R2.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz,sample2_SR_R1.fastq.gz,sample2_SR_R2.fastq.gz

Other changes

Hybracter v0.9.0 will automatically support the reorientation of archaeal chromosomes (thanks @richardstoeckl) to begin with the cog1474 Orc1/cdc6 gene.
--datadir can now also accept 2 paths separated by a comma, if you have long reads and short reads in separate directories e.g. --datadir "long_read_dir,short_read_dir" (#76).
--min_depth parameter added. Hybracter will error out if your QC'd long reads have a coverage lower than min_depth for a sample (#89).

Assets 2

03 Sep 06:55

gbouras13

v0.8.0

14d35b3

v0.8.0

Adds --datadir that removes the need to add full paths in sample sheet (thanks @oschwengers)
Update medaka to v1.12.1 to support the newest models (#84 )
- New default medaka model is r1041_e82_400bps_sup_v5.0.0
Adds --mac flag if you are running Hybracter on MacOS - it is now recommended from to run Hybracter on Linux if you want the latest Medaka models.
- This is because ONT do not support bioconda install anymore and the latest version (v1.12.1) from pip doesn't work on Mac
- --mac will install and run Medaka v1.8.0 as in previous versions and use r1041_e82_400bps_sup_v4.2.0 as default

Contributors

oschwengers

Assets 2

04 Apr 21:54

gbouras13

v0.7.3

cbfbbe1

0.7.3

Enforce spades>=v3.15.2 in the plassembler.yaml environment
For some reason, the environment on Linux environments was being solved for v3.14.1, which was causing an error with Unicycler within Plassembler for some samples described (rrwick/Unicycler#318)

Assets 2

02 Apr 05:55

gbouras13

v0.7.2

3706a74

v0.7.2

Adds 'circular=True' to chromosome contig headers where Flye has marked these as such. A bug was introduced in v0.7.0.
Thanks Nicole Lerminiaux for spotting this

Assets 2

13 Mar 05:46

gbouras13

v0.7.1

fd423f0

v0.7.1

Fixes bug #60 where hybracter install -d db_dir would not work as the -f parameter was not being passed to Plassembler. Thanks @npbhavya

Contributors

npbhavya

Assets 2

04 Mar 05:47

gbouras13

v0.7.0

1b0cdc9

v0.7.0

Bug fixes

Fixes bug where --configfile wasn't being passed to Hybracter.
Fixes bug where hybracter would crash if the input long reads were not gzipped #51 thanks @wanyuac.

Changes to short read polishing.

Logic added to run polypolish v0.6.0 with --careful and skip pypolca if the SR coverage estimate is below 5x (note: FASTA files for pypolca will be generated in the processing directory to play nice with Snakemake, but these will be identical to the polypolish output).
For 5-25x coverage, polypolish --careful and pypolca with --careful will be run.
For >25x coverage, polypolish default and pypolca with --careful will be run - this should be the most common case.
A preprint justifying these changes will be available soon.

--logic changes

By default, --logic now defaults to last for hybracter hybrid, as there we have found that the polishing strategy implemented above never makes the assembly worse. We suggest never using --logic best with hybracter hybrid.

Changes for chromosome contigs and circularity.

If hybracter assembles a contig that is greater than the minimum chromosome length but not marked as circular by Flye, this will now be denoted as a chromosome, but not circular. The genome will be marked as complete also.
- These will usually be assemblies with some issue (e.g. prophages, circularisation issues, heterogeneity) and probably require some more attention.
- For example, with the Vibrio cholerae larger chromosome described here, the genome will be marked as 'complete' but the contig will not be marked as 'circular' in the hybracter output.
- Such contigs will be polished and be in the final _chromosome.fasta output, but they will not be rotated by dnaapler.
- These were previously being excluded, which was missing assemblies with structural heterogeneity (causing the chromosome not to completely circularise) or even bacteria with linear chromosomes like Borrelia.

Adds --depth_filter

This is passed to Plassembler and will filter out all putative plasmid contigs that are lower than this depth fraction compared to the chromosome.
Defaults to 0.25 like Unicycler's implementation.

Contributors

wanyuac

Assets 2

18 Jan 09:19

gbouras13

v0.6.0

f6b8e7f

v0.6.0

Fixes bug with Polypolish v0.6.0 breaking the CLI #49 thanks @wanyuac
Adds -m option to download all Medaka models with hybracter install - useful for offline use #48 @lxsteiner
Adds quick SR coverage estimates (in processing/qc/coverage) and other QC stats (using seqkit ) inprocessing/qc/seqkit. This is calculated as (Total bases / estimated chromosome size) for each sample
Logic added to run Polypolish and pypolca with --careful if the SR coverage estimate is below 25x.

Contributors

wanyuac and lxsteiner

Assets 2

09 Jan 03:25

gbouras13

v0.5.0

fdfba14

v0.5.0

Ryan Wick recently ran hybracter long on the latest Dorado v0.5.0 Nanopore reads from his blog post.

You can read a write-up of the results here.

Added Features in v0.5.0

Adds subsampling using --subsample_depth using Filtlong, based on some benchmarking of Dorado v0.5.0 reads. Defaults to 100 i.e. 100x of the estimated chromosome size -c.
Adds stricter criteria for complete assemblies (aka ensures that identified chromosomes must be circularised according to Flye). Thanks to Matthew Croxen for pointing this out.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Contributors

Contributors

Contributors

Contributors

Contributors

Added Features in v0.5.0

Releases: gbouras13/hybracter

v0.10.0

Contributors

v0.9.1

Contributors

v0.9.0

v0.8.0

Contributors

0.7.3

v0.7.2

v0.7.1

Contributors

v0.7.0

Contributors

v0.6.0

Contributors

v0.5.0

Added Features in v0.5.0