-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Labels
Description
Motivation: One blind spot of the reference-based mapping approach is that it cannot detect novel sequence insertions, such as a Tn7 insertion into a genome of interest.
Implementation: As an accessory tool that performs a post-processing step in Python that calls a de novo assembler and analyzes its results, integrate the predictions with the normal breseq output files to elaborate upon them.
- Filter unmapped reads to remove junk so there is better input to the assembler
- Decide on the best, most lightweight assembler to support
- Provide functionality for scanning/testing different assembly parameters and judging different contigs.
- Output a new reference sequence file containing the good contigs that could be re-input for a new round of breseq processing that might, for example, show where/how these sequences are inserted precisely.
mc-williams