+ReadStepper is a simple grep-trim-sort algorithm that allows the user to examine how raw short (eg. Illumina) sequencing reads inform a given flank of a query sequence. This is useful for examining challenging regions of an assembly, for example at repetitive regions or the closure of a circular contig. The user inputs a short query sequence (a 20-mer is usually adequate) and one or more paths to a fastq file. By default the downstream flank of the query is examined, but this can be switched to the upstream flank. The first step is to grep for the query sequence and its reverse complement among the reads, orienting all hit read sequences to the sense of the query. The second step is trim the query sequence and its unexamined flank from all reads. Then query-flanking sequences are alphabetically sorted. If the flanking sequence is single-copy, the sorted sequences should form a single horn with some noise due to occasional sequence errors. If the query sequence is from a repetitive sequence with two different right flanks, two horns will be produced (as in Fig. S1 of the reference).
0 commit comments