Skip to content
forked from jdegges/DNArm

DNA read mapper will map short reads to the entire genome using all of your cores -- and GPU too

License

Notifications You must be signed in to change notification settings

stephenchen13/DNArm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DNArm: A fast DNA read mapper

Project Description:

Develop and implement an algorithm for efficiently mapping short reads of DNA
(~30 bases) to the entire genome (~3,000,000,000 bases). In more computer
science terms: map strings of length 30 to a constant string of length 3
billion characters. In addition, all mappings that match 28/30 characters
should be considered positive.

Parallelization Techniques:

(a) Build a tree of depth 30 with branching factor 5 (one branch to help with
fuzzy matching) and have different execution units search through the tree for
matches at the same time. The whole tree wont be able to fit into main memory
at once but trees are easily broken into components that will fit.

(b) Generate an index of the genome using strings of length 10. Look up each
1/3rd of a short read in the index and then search through the possible
locations in different execution units.

(c) Combining the tree, index, and maybe some other types of data structures
that we could use to further decompose the search.

Hardware/Technologies: OpenCL

Webpage: http://cs124project-2010.wikidot.com/133-124-joint-proj
Google group: http://groups.google.com/group/ucla-cs133cm124

About

DNA read mapper will map short reads to the entire genome using all of your cores -- and GPU too

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 99.2%
  • Common Lisp 0.8%