-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix bug where original read name could affect demultiplexing
Bug description - The bug was unlikely to have affected real-world usage, but it did affect the output produced from the toy dataset included in the repository - During barcode identification, the pipeline adds identified tags to read names such that the read names acquire the following structure: @<original read name>::[tag1][tag2]...[tagN] Most subsequent scripts use the double colon delimiter to separate the original read name from the tags. However, 2 scripts `barcode_identification_efficiency.py` and `fastq_to_bam.py` did not use the `::` delimiter and instead only relied on matching the bracketed tag name structure `[tag]`. Consequently, if the read name itself had a bracketed structure (as do the reads in the toy dataset: e.g., `@[BEAD_AB1-A1][OddBot_5-A5][EvenBot_10-A10][OddBot_46-D10][EvenBot_45-D9][OddBot_67-F7][NYStgBot_83-G11]_CAATGATG`), then the "tags" in the original read name (rather than those identified during the pipeline barcode identification step) were used. - Fix: The two scripts have been updated to identify tags only after the `::` delimiter in read names. Other changes - Improve documentation about pipeline assumptions. - Add target wildcard constraints to Snakefile to prevent rule conflicts - i.e., to ensure that each desired output file can only be generated by 1 rule. This should also dramatically speed up DAG generation by Snakemake. - Improve pipeline verification script - Enforce locale (export LC_ALL=C) to fix sorting order - Use natural chromosome, start position, end position sorting for the test BED files. - Update MD5 checksums accordingly TODO - Incorporate assumptions into validation rule
- Loading branch information
Showing
7 changed files
with
239 additions
and
222 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.