|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +parent: FAQ |
| 4 | +title: reads.bam |
| 5 | +--- |
| 6 | + |
| 7 | +# What is the `reads.bam`? |
| 8 | +Have you ever run _ccs_ with different cutoffs, e.g. tuning `--min-rq` , because |
| 9 | +out of the fear of missing out on yield? |
| 10 | +Similar to the CLR instrument mode, in which subreads are accompanied by |
| 11 | +a scraps file, _ccs_ offers a new mode to never lose a single read due to |
| 12 | +filtering, without massive run time increase by polishing low-pass productive ZMWs. |
| 13 | + |
| 14 | +Starting with SMRT Link v10.0 and Sequel IIe, _ccs_ v5.0 or newer is able to generate |
| 15 | +one representative sequence per productive ZMW, irrespective of quality and passes. |
| 16 | +This ensures no yield loss due to filtering and enables users to have maximum |
| 17 | +control over their data. Never fear again that SMRT Link or the Sequel IIe |
| 18 | +HiFi mode filtered precious data. |
| 19 | + |
| 20 | +**Attention:** If you work with the `reads.bam` file directly, be aware that CCS reads of all |
| 21 | +qualities are present. This file needs to be understood before piping |
| 22 | +into your typical HiFi application. |
| 23 | + |
| 24 | +## How to generate `reads.bam`? |
| 25 | + |
| 26 | +The default command-line behavior has not changed; |
| 27 | +it still generates only HiFi quality reads by default. |
| 28 | +But the new `--all` mode has been set as default when running the |
| 29 | +_Circular Consensensus Sequencing_ SMRT Link application or |
| 30 | +selecting the on-instrument Sequel IIe capabilities: |
| 31 | +<p align="left"><img width="500px" src="../img/run-design-oiccs.png"/></p> |
| 32 | + |
| 33 | +## What is in the `reads.bam`? |
| 34 | + |
| 35 | +- HiFi Reads with predicted accuracy ≥Q20 (`rq ≥ 0.99`) |
| 36 | +- Lower-quality but still polished consensus reads with predicted accuracy <Q20 (`rq < 0.99`) |
| 37 | +- Unpolished consensus reads (`rq = -1`) |
| 38 | +- Partial or single full-length subreads unaltered (`rq = -1`) |
| 39 | + |
| 40 | +## How to get HiFi reads |
| 41 | + |
| 42 | +### SMRT Link |
| 43 | + |
| 44 | +If you want to only use HiFi reads, SMRT Link automatically generates additional |
| 45 | +files for your convenience that only contain HiFi reads: |
| 46 | + |
| 47 | + - hifi_reads.**fastq**.gz |
| 48 | + - hifi_reads.**fasta**.gz |
| 49 | + - hifi_reads.**bam** |
| 50 | + |
| 51 | +### Command line |
| 52 | + |
| 53 | +Following tools can be installed with |
| 54 | + |
| 55 | + conda install -c bioconda tool_name |
| 56 | + |
| 57 | +#### extracthifi |
| 58 | +We provide a simple tool, called `extracthifi` to generate a HiFi-only BAM from a `reads.bam` file. Usage is: |
| 59 | + |
| 60 | + extracthifi reads.bam extracthifi.bam |
| 61 | + |
| 62 | +#### bamtools |
| 63 | +Alternatively use `bamtools`: |
| 64 | + |
| 65 | + bamtools filter -in reads.bam -out hifi_reads.bam -tag "rq":">=0.99" |
| 66 | + |
| 67 | +## FAQ: How can I filter by number of passes? |
| 68 | + |
| 69 | +We **strongly** advise against filtering by anything than predicted accuracy, |
| 70 | +BAM tag `rq`. The `rq` tag is the best predictor for read quality. Number of |
| 71 | +passes is not reliable enough and you might discard too much data. This `np` |
| 72 | +tag is an implementation detail that is guaranteed to be present in future |
| 73 | +_ccs_ versions. |
0 commit comments