Sequence Assembly Using deBruijn Graphs

Description

De-Bruijn graph traversal is a popular method to assemble contigs from genomes. De-Bruijn graphs are constructed by splitting the genome into K-mers and using them as the edges of the graph while K-1 mers become the nodes. This essentially turns the assembly problem into a graph traversal problem. While De-Bruijn graphs have advantages, like offering compact representation of genomes, they also have drawbacks. The major drawback is that the technique assumes perfect sequencing, which is quite difficult to achieve in practice. Other drawbacks include repeat edges which lead to bloated graphs and the existence of multiple assemblies which can be difficult to resolve. In this project, we explored how De-Bruijn graphs can be used for assembling the SARS-Cov2 genome, while addressing some of the shortcomings and taking note of research that provides possible solutions to the others. We also give a description of the steps and methodology used in this project, like the data cleaning techniques, which will then enable other researchers to replicate our results or expand on our work. We also perform experiments with a range of parameters as a way to conduct analysis and provide neat and interactive visualizations of our results.

Visualization Demo

Instructions to run the code

Run the assembler.py module on cleaned, trimmed FASTQ formatted reads. Use the following command to generate the results for visualization.

 python assembler.py\
     --k <set the starting value of k>\
     --threshold <set the threshold for discarding infrequent k-mers>\
     --input <set the input directory>\
     --output <set the output directory>\ 
     --prune <set to True if k-mer filtering needs to be enabled>\
     --ref <reference genome file>

Run the flask_app.py with the results generated from assembler placed inside a folder called output in the same directory as the flask app. Run the following command to launch the app
```
 python  flask_app.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
readme_res		readme_res
reference		reference
sars-cov-2-raw-data		sars-cov-2-raw-data
sars-cov-2-trimmed		sars-cov-2-trimmed
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
aligner.py		aligner.py
assembler.py		assembler.py
flask_app.py		flask_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequence Assembly Using deBruijn Graphs

Description

Visualization Demo

Instructions to run the code

About

Releases

Packages

Contributors 3

Languages

peacekurella/SequenceAssembly

Folders and files

Latest commit

History

Repository files navigation

Sequence Assembly Using deBruijn Graphs

Description

Visualization Demo

Instructions to run the code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages