From b38dcfa1d667b433f1effadc6da59b2813874c37 Mon Sep 17 00:00:00 2001 From: Joyjit Daw Date: Sun, 19 May 2024 19:21:13 -0700 Subject: [PATCH 1/6] add changelog changes from v0.6 --- CHANGELOG.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index e78c1a43..a81f8f51 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,26 @@ All notable changes to Dorado will be documented in this file. +# [0.6.2] (9 May 2024) + +This release of Dorado disables trimming of the rapid adapter during basecalling which was causing some RBK datasets to have a high unclassified rate during demux. + +* a64492b69eb59c1d60d602fee1670085338450c4 - Fix bug with loading reverse aligned records in dorado demux and trim +* 6cc278f4d7759a7aaaa9a9b336d843b127b0d7ed - Disable rapid adapter trimming to prevent signal overtrimming in some RBK datasets + +# [0.6.1] (23 April 2024) + +This release of Dorado fixes bugs in `dorado aligner` related to using presets incorrectly and in `dorado demux` which were causing demultiplexed outputs to be malformed. + +* 3e060db5a35ab09fecbeef9754cc545ba400edf1 - Skip stripping of SQ header lines in dorado demux --no-classify +* a2abf83852e895b1016c690769b59c06587684fe - Fix incorrect overriding of minimap2 options when minimap2 preset is specified +* 1cc207a166b1cafcbd012f5c70b5c817c788c7f3 - Fix bug causing unclassified records from `dorado demux` to be unreadable by samtools +* 298277150ad2522ca6c1928c4981782ce2893a5a - Fix issue with allocating memory on unused GPU during basecalling +* fa79f4a77fca737704d8a9e08d0495b9988f88ef - Fix reverse strand alignments when re-mapping a SAM/BAM file with `dorado aligner` +* 3b2c8252d1a40bb0f941ca2ceca0849be15d15fa - Propagate `sv` tag to split reads +* 11675a565da9af52de89a3f6614d15e57d10765d - Fix bug where errors were being swallowed in HtsFile class +* 73046e19fd443dfb48f3fbb82c0b37c5c7cfb8d5 - Fx typo in Warnings.cmake + # [0.6.0] (1 April 2024) This release of Dorado improves performance for short read basecalling and RBK barcode classification rates, introduces sorted and indexed BAM generation in Dorado aligner and demux, and updates the minimap2 version and default mapping preset. It also adds GPU information to the output BAM or FASTQ and includes several other improvements and bug fixes. From a07dc388415be436ea3c03455fe58391606c6764 Mon Sep 17 00:00:00 2001 From: Joyjit Daw Date: Sun, 19 May 2024 19:56:07 -0700 Subject: [PATCH 2/6] Update documentation for 0.7.0 release 1. Update README 2. Update CHANGELOG 3. Update version number Closes DOR-714 --- CHANGELOG.md | 45 +++++++++++++++++++++++++++++++++++++++ README.md | 8 +++---- cmake/DoradoVersion.cmake | 2 +- 3 files changed, 50 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a81f8f51..d986ab6e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,51 @@ All notable changes to Dorado will be documented in this file. +# [0.7.0] (21 May 2024) + +## New feature highlights + +1. DNA, RNA and duplex basecalling models with improved single read accuracy. +2. Support for 5mC_4mC methylation calling. +3. PolyA tail estimation for transcripts with interrupted tails and plasmids. +4. `dorado correct` subcommand for single-read error correction of diploid genomes (for assembly pipelines). +5. Support for --junc-bed mm2 splice option. +6. Faster BAM indexing and sorting code. + +## Changes to default behavior + +1. Data tyoe of mean Q-score tag (`qs` updated to `float`. + +## Backwards incompatible changes + +1. `TWIST` barcode names updated to better reflect kit composition. + +## All key changes + +* 159b73c7fea64d374b562af32abeaa382af54354 - Add new methylation calling models (5mC_4mC, m6A, pseU) +* cf46f49c620633bf724904b834e8d394073d0bc4 - Raise error if PolyA config file is not found +* dc50b97605762423d09d91ec74dd95ad2b5c97c9 - Add MacOS support for v5 basecalling model +* d6b0f68b3617f34a321db676b780e1a1183b6060 - Change data type of mean Q-score (`qs` tag) to float +* 7a09ca3d1d1e469570a7df1e5819c39e9dd2325e - Add v5 models for DNA, RNA and duplex +* f938c415ddc9f458fe718af72c82001448d9c3c7 - List supported models in structured format +* 70ff95d84b316adb4701f7f43a19151e73b58b5b - Enable `dorado summary` to run on trimmed BAM files +* 6373792b686538758a16aacb063434c2b3260077 - Detect presence of midstrand barcodes to reduce false positive classifications +* 68d40da45da886384508173219a9fb677fc50cef - Add support for --junc-bed mm2 splice option +* 93632025d7df195be625654d968f62321c4a4136 - Update `TWIST` barcode kit names +* 381f6c3038fb69523ea591b1942d3293d7e9b9aa - Enable adapter trimming when polyA estimation is requested +* be8ac08652d5fe0b73c1126048b7fd96f29f3419 - Add `dorado correct` support for read error correction +* a30c489c41bafb3e307060806c5b57caa2c610ef - Use new transformer Koi fused residual rmsnorm kernel +* c443f75314708b7aed0aafa38fffdb8b2e76e9f2 - Output BAM from dorado trim command +* eaf4ab28d958c4426a6f57eb9c2a7032d5e1fa80 - Update documentation to reflect new `dorado aligner` defaults +* a3dce7ebe298ce3e17f3d61ad180b099700afb6a - Demux header merge improvements +* 67dc5bab58d74ee636e492619a6802db38059534 - Plasmid polyA estimation +* 6ccf0ed46d275c1e0209de3cb99d0bd56bf7f083 - Add support for v5 basecalling model +* 08e2c7bb2538c2ba89203a68bbf153e6a6054535 - Index BAM while merging temp files +* b8de2d900d9aeb1c349931a216db7e05aa2ff2c4 - Set max memory sizes in minimap2 +* b8de2d900d9aeb1c349931a216db7e05aa2ff2c4 - Calculate scaling for rna on non-adapter signal only +* 949d13ffb41152aaba4df9004d01e8584c8038e3 - Write multiple temp files for sorted bam output +* c88e9f753219f3c462c3678ddfad6b4561830f33 - Update CMake Minimum Version to 3.23 + # [0.6.2] (9 May 2024) This release of Dorado disables trimming of the rapid adapter during basecalling which was causing some RBK datasets to have a high unclassified rate during demux. diff --git a/README.md b/README.md index f725179e..1991d8e8 100644 --- a/README.md +++ b/README.md @@ -19,10 +19,10 @@ If you encounter any problems building or running Dorado, please [report an issu ## Installation - - [dorado-0.6.0-linux-x64](https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.6.0-linux-x64.tar.gz) - - [dorado-0.6.0-linux-arm64](https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.6.0-linux-arm64.tar.gz) - - [dorado-0.6.0-osx-arm64](https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.6.0-osx-arm64.zip) - - [dorado-0.6.0-win64](https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.6.0-win64.zip) + - [dorado-0.7.0-linux-x64](https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.7.0-linux-x64.tar.gz) + - [dorado-0.7.0-linux-arm64](https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.7.0-linux-arm64.tar.gz) + - [dorado-0.7.0-osx-arm64](https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.7.0-osx-arm64.zip) + - [dorado-0.7.0-win64](https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.7.0-win64.zip) ## Platforms diff --git a/cmake/DoradoVersion.cmake b/cmake/DoradoVersion.cmake index 92a8c968..8afd63d0 100644 --- a/cmake/DoradoVersion.cmake +++ b/cmake/DoradoVersion.cmake @@ -1,5 +1,5 @@ set(DORADO_VERSION_MAJOR 0) -set(DORADO_VERSION_MINOR 6) +set(DORADO_VERSION_MINOR 7) set(DORADO_VERSION_REV 0) find_package(Git QUIET) From 9e91db33216d3fd3db29c1b3f90ca03743762a66 Mon Sep 17 00:00:00 2001 From: Joyjit Daw Date: Sun, 19 May 2024 20:19:06 -0700 Subject: [PATCH 3/6] add release summary --- CHANGELOG.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index d986ab6e..5d38e59c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,8 @@ All notable changes to Dorado will be documented in this file. # [0.7.0] (21 May 2024) +This release of Dorado introduces new and more accurate v5 models for improved basecalling. It also adds a new subcommand, `dorado correct`, for single-read error correction to help Nanopore based `de novo` assemblies of diploid genomes. In addition, this release contains a slew of bug fixes, stability enhancements and updates to barcode classification. + ## New feature highlights 1. DNA, RNA and duplex basecalling models with improved single read accuracy. @@ -16,6 +18,7 @@ All notable changes to Dorado will be documented in this file. ## Changes to default behavior 1. Data tyoe of mean Q-score tag (`qs` updated to `float`. +2. Adapter trimming enabled when PolyA estimation is requested. ## Backwards incompatible changes From ccb07f59f7c12ae543d6db4a6cc246ab70b02bd7 Mon Sep 17 00:00:00 2001 From: Joyjit Daw Date: Mon, 20 May 2024 06:34:41 -0700 Subject: [PATCH 4/6] fix typos --- CHANGELOG.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5d38e59c..1499385e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -17,8 +17,8 @@ This release of Dorado introduces new and more accurate v5 models for improved b ## Changes to default behavior -1. Data tyoe of mean Q-score tag (`qs` updated to `float`. -2. Adapter trimming enabled when PolyA estimation is requested. +1. Data type of mean Q-score tag (`qs`) updated to `float`. +2. Adapter trimming is enabled when PolyA estimation is requested. ## Backwards incompatible changes @@ -68,7 +68,7 @@ This release of Dorado fixes bugs in `dorado aligner` related to using presets i * fa79f4a77fca737704d8a9e08d0495b9988f88ef - Fix reverse strand alignments when re-mapping a SAM/BAM file with `dorado aligner` * 3b2c8252d1a40bb0f941ca2ceca0849be15d15fa - Propagate `sv` tag to split reads * 11675a565da9af52de89a3f6614d15e57d10765d - Fix bug where errors were being swallowed in HtsFile class -* 73046e19fd443dfb48f3fbb82c0b37c5c7cfb8d5 - Fx typo in Warnings.cmake +* 73046e19fd443dfb48f3fbb82c0b37c5c7cfb8d5 - Fix typo in Warnings.cmake # [0.6.0] (1 April 2024) From 8e36d89b9684d4b02c8fa00f412d100185253bad Mon Sep 17 00:00:00 2001 From: Joyjit Daw Date: Tue, 21 May 2024 05:26:58 -0700 Subject: [PATCH 5/6] address review comments --- CHANGELOG.md | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1499385e..1ed48c5e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,13 +4,13 @@ All notable changes to Dorado will be documented in this file. # [0.7.0] (21 May 2024) -This release of Dorado introduces new and more accurate v5 models for improved basecalling. It also adds a new subcommand, `dorado correct`, for single-read error correction to help Nanopore based `de novo` assemblies of diploid genomes. In addition, this release contains a slew of bug fixes, stability enhancements and updates to barcode classification. +This release of Dorado introduces new and more accurate v5 models for improved basecalling. It also adds a new subcommand, `dorado correct`, for single-read error correction to help Nanopore based `de novo` assemblies of haploid or diploid genomes. In addition, this release contains a slew of bug fixes, stability enhancements and updates to barcode classification. ## New feature highlights 1. DNA, RNA and duplex basecalling models with improved single read accuracy. -2. Support for 5mC_4mC methylation calling. -3. PolyA tail estimation for transcripts with interrupted tails and plasmids. +2. Support for 4mC_5mC methylation calling. +3. PolyA tail estimation for plasmids and transcripts with interrupted tails. 4. `dorado correct` subcommand for single-read error correction of diploid genomes (for assembly pipelines). 5. Support for --junc-bed mm2 splice option. 6. Faster BAM indexing and sorting code. @@ -20,13 +20,9 @@ This release of Dorado introduces new and more accurate v5 models for improved b 1. Data type of mean Q-score tag (`qs`) updated to `float`. 2. Adapter trimming is enabled when PolyA estimation is requested. -## Backwards incompatible changes - -1. `TWIST` barcode names updated to better reflect kit composition. - ## All key changes -* 159b73c7fea64d374b562af32abeaa382af54354 - Add new methylation calling models (5mC_4mC, m6A, pseU) +* 159b73c7fea64d374b562af32abeaa382af54354 - Add new models for calling DNA and RNA base modifications (4mC_5mC, m6A, pseU) * cf46f49c620633bf724904b834e8d394073d0bc4 - Raise error if PolyA config file is not found * dc50b97605762423d09d91ec74dd95ad2b5c97c9 - Add MacOS support for v5 basecalling model * d6b0f68b3617f34a321db676b780e1a1183b6060 - Change data type of mean Q-score (`qs` tag) to float @@ -35,7 +31,7 @@ This release of Dorado introduces new and more accurate v5 models for improved b * 70ff95d84b316adb4701f7f43a19151e73b58b5b - Enable `dorado summary` to run on trimmed BAM files * 6373792b686538758a16aacb063434c2b3260077 - Detect presence of midstrand barcodes to reduce false positive classifications * 68d40da45da886384508173219a9fb677fc50cef - Add support for --junc-bed mm2 splice option -* 93632025d7df195be625654d968f62321c4a4136 - Update `TWIST` barcode kit names +* 93632025d7df195be625654d968f62321c4a4136 - Update barcode kit names * 381f6c3038fb69523ea591b1942d3293d7e9b9aa - Enable adapter trimming when polyA estimation is requested * be8ac08652d5fe0b73c1126048b7fd96f29f3419 - Add `dorado correct` support for read error correction * a30c489c41bafb3e307060806c5b57caa2c610ef - Use new transformer Koi fused residual rmsnorm kernel From 4d73774ddcc2102085d6e9732ab32a95ca4d8d3f Mon Sep 17 00:00:00 2001 From: Susie Lee Date: Tue, 21 May 2024 13:40:10 +0100 Subject: [PATCH 6/6] Changelog updates --- CHANGELOG.md | 39 ++++++++++++++++----------------------- 1 file changed, 16 insertions(+), 23 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1ed48c5e..240a582c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,46 +4,39 @@ All notable changes to Dorado will be documented in this file. # [0.7.0] (21 May 2024) -This release of Dorado introduces new and more accurate v5 models for improved basecalling. It also adds a new subcommand, `dorado correct`, for single-read error correction to help Nanopore based `de novo` assemblies of haploid or diploid genomes. In addition, this release contains a slew of bug fixes, stability enhancements and updates to barcode classification. +This release of Dorado introduces new and more accurate v5 models for improved basecalling. It also adds a new subcommand, `dorado correct`, for single-read error correction to help Nanopore based *de novo* assemblies of haploid or diploid genomes. In addition, this release contains a slew of bug fixes, stability enhancements and updates to barcode classification. ## New feature highlights 1. DNA, RNA and duplex basecalling models with improved single read accuracy. -2. Support for 4mC_5mC methylation calling. -3. PolyA tail estimation for plasmids and transcripts with interrupted tails. -4. `dorado correct` subcommand for single-read error correction of diploid genomes (for assembly pipelines). -5. Support for --junc-bed mm2 splice option. +2. Support for `4mC_5mC` methylation calling in DNA and all-context `m6A` and `pseU` in RNA. +3. `dorado correct` subcommand for single-read error correction of haploid and diploid genomes (for assembly pipelines). +4. Poly(A) tail estimation for plasmids and transcripts with interrupted tails. +5. Support for `--junc-bed` minimap2 splice option. 6. Faster BAM indexing and sorting code. ## Changes to default behavior 1. Data type of mean Q-score tag (`qs`) updated to `float`. -2. Adapter trimming is enabled when PolyA estimation is requested. +2. Adapter trimming is enabled when poly(A) estimation is requested. ## All key changes +* 7a09ca3d1d1e469570a7df1e5819c39e9dd2325e - Add v5 basecalling models for DNA, RNA and duplex * 159b73c7fea64d374b562af32abeaa382af54354 - Add new models for calling DNA and RNA base modifications (4mC_5mC, m6A, pseU) -* cf46f49c620633bf724904b834e8d394073d0bc4 - Raise error if PolyA config file is not found -* dc50b97605762423d09d91ec74dd95ad2b5c97c9 - Add MacOS support for v5 basecalling model +* be8ac08652d5fe0b73c1126048b7fd96f29f3419 - Add `dorado correct` support for read error correction +* 67dc5bab58d74ee636e492619a6802db38059534 - Poly(A) estimation for plasmids and interrupted tails +* 381f6c3038fb69523ea591b1942d3293d7e9b9aa - Enable adapter trimming when poly(A) estimation is requested * d6b0f68b3617f34a321db676b780e1a1183b6060 - Change data type of mean Q-score (`qs` tag) to float -* 7a09ca3d1d1e469570a7df1e5819c39e9dd2325e - Add v5 models for DNA, RNA and duplex * f938c415ddc9f458fe718af72c82001448d9c3c7 - List supported models in structured format * 70ff95d84b316adb4701f7f43a19151e73b58b5b - Enable `dorado summary` to run on trimmed BAM files * 6373792b686538758a16aacb063434c2b3260077 - Detect presence of midstrand barcodes to reduce false positive classifications -* 68d40da45da886384508173219a9fb677fc50cef - Add support for --junc-bed mm2 splice option -* 93632025d7df195be625654d968f62321c4a4136 - Update barcode kit names -* 381f6c3038fb69523ea591b1942d3293d7e9b9aa - Enable adapter trimming when polyA estimation is requested -* be8ac08652d5fe0b73c1126048b7fd96f29f3419 - Add `dorado correct` support for read error correction -* a30c489c41bafb3e307060806c5b57caa2c610ef - Use new transformer Koi fused residual rmsnorm kernel -* c443f75314708b7aed0aafa38fffdb8b2e76e9f2 - Output BAM from dorado trim command -* eaf4ab28d958c4426a6f57eb9c2a7032d5e1fa80 - Update documentation to reflect new `dorado aligner` defaults -* a3dce7ebe298ce3e17f3d61ad180b099700afb6a - Demux header merge improvements -* 67dc5bab58d74ee636e492619a6802db38059534 - Plasmid polyA estimation -* 6ccf0ed46d275c1e0209de3cb99d0bd56bf7f083 - Add support for v5 basecalling model -* 08e2c7bb2538c2ba89203a68bbf153e6a6054535 - Index BAM while merging temp files -* b8de2d900d9aeb1c349931a216db7e05aa2ff2c4 - Set max memory sizes in minimap2 -* b8de2d900d9aeb1c349931a216db7e05aa2ff2c4 - Calculate scaling for rna on non-adapter signal only -* 949d13ffb41152aaba4df9004d01e8584c8038e3 - Write multiple temp files for sorted bam output +* 68d40da45da886384508173219a9fb677fc50cef - Add support for `--junc-bed` minimap2 splice option +* c443f75314708b7aed0aafa38fffdb8b2e76e9f2 - Output BAM instead of SAM from `dorado trim` command +* a3dce7ebe298ce3e17f3d61ad180b099700afb6a - Support `dorado demux` from input folders with mix of PG and SQ headers +* 08e2c7bb2538c2ba89203a68bbf153e6a6054535 - Speed up sorting and merging of BAM files +* b8de2d900d9aeb1c349931a216db7e05aa2ff2c4 - Set maximum memory sizes in minimap2 +* b8de2d900d9aeb1c349931a216db7e05aa2ff2c4 - Calculate scaling for RNA on non-adapter signal only * c88e9f753219f3c462c3678ddfad6b4561830f33 - Update CMake Minimum Version to 3.23 # [0.6.2] (9 May 2024)