Code accompanying "Direct detection of RNA modifications and structure using single molecule nanopore sequencing"
In vitro nanoSHAPE analysis can be performed on a single target using the commands below. First the multi-fast5 files must be converted to single read fast5 files prior to processing with tombo. Tombo is then used to resquiggle the reads, detect modifications, and generate per-read stats file and optional plot. Comprehensive documentation regarding tombo can be found here. Finally, the extract_array.py script is used to generate an output numpy array which can be easily processed using python for per-read modification visualization/plotting.
A general workflow should include sequencing unmodified (native/control) RNA and comparing to the same RNA but modified (with AcIm or other). Both the unmodified and modified reads should be converted to single fast5 files and resquiggled using tombo.
multi_to_single_fast5
-i <directory of multi fast5s>
--recursive
--save_path <output path>
--flowcell <flowcell spec>
--kit <kit spec>
--fast5_out
--cpu_threads_per_Caller <num threads>
tombo resquiggle <path to the directory containing single fast5 files> <path to fasta reference>
--include-event-stdev
--num-most-common-errors 5
--ignore-read-locks
--overwrite
--processes <num processes>
tombo detect_modifications model_sample_compare
--fast5-basedirs <path to single fast5 files MODIFIED>
--control-fast5-basedirs <path to single fast5 files CONTROL/UNMODIFIED>
--statistics-file-basename <basename for stats output>
--minimum-test-reads <num min test reads, typically 100 for in vitro>
--multiprocess-region-size 250
tombo detect_modifications model_sample_compare
--fast5-basedirs <path to single fast5 files MODIFIED>
--control-fast5-basedirs <path to single fast5 files CONTROL/UNMODIFIED>
--statistics-file-basename <basename for stats output>
--per-read-statistics-basename <basename for per-read stats output>
--minimum-test-reads <num min test reads, typically 100 for in vitro>
--multiprocess-region-size 250
tombo plot per_read
--genome-locations <genomic location of target>
--per-read-statistics-filename <per-read statistics filename generated in step4>
--genome-fasta <FASTA reference file>
--fast5-basedirs <path to single fast5 files MODIFIED>
--num-reads <number of reads>
--num-bases <num bases>
--pdf-filename <output pdf>
python extract_array.py
--fa <FASTA reference file>
--f5 <path to single fast5 files to process, either MODIFIED or UNMODIFIED>
--out <numpy array out path with extension ".npy">
--rna-name <RNA name from FASTA reference>
--sort-reads
--current (or --dwell, --stdev)