-
Notifications
You must be signed in to change notification settings - Fork 0
Week 6 (14 Oct.)
As stated before, I implemented the banded algorithm on word level, and made several runs to measure how fast (or how slow...) the program is running. I made my measures with sets of 10 million sequences, with lengths of 150, 300 and 600 bases. I grouped all the results in a comprehensive calculation sheet with its graphs.
What I can tell is, I only get substantial speedups for 150 query lengths. At 300, the speedups is much reduced, and even more surprisingly, at 600 query length I even get a slowdown that appears for band larger than 24, but that slowdown gradually disappear as I reach the point where no band is actually used. This makes me say that, in the current implementation, there are two different forces that tend to speed up and slow down, and using a small band, you get a speedup, but the longer the sequence length, the bigger the slowdown.
I tried to implement a banded algorithm using only a fixed-length band, but this results in computing the band of a square matrix, and not only do a get a deceivingly low speedup, but most importantly, the results I get are utterly wrong (scores are rarely higher than 20). This is probably not the best way to seek for more performance.
Finally, I re-implemented the small optimization that was brought originally by Nauman, but was buggy (it didn't produce always the same results). Now, it does provide the exact same results for every run ; and for a 300 bases length query sequence batch, I get a nice 9% speedup. Again, this is basically "free lunch" as the results are not stripped down, unlike for banded algorithm. It can be implemented everywhere, and saves some time in a safe manner. The "hack" is to avoid reading an writing directly from and to arrays in the internal loop, but rather use temporary variables as buffers (that are probably directly registers), in order to minimize read-writes from locations a bit remote. It does help, so I got that going for me, which is nice.
I think this closes the chapter of banded implementation. Next time: implementing new algorithms for semi-global alignment.