Releases: Qile0317/KmerGMA.jl
v0.5.3
Made the hit alignment parameters more customizable (see docstring for the homology searching functions) and fixed hit index kmer distance bug in findGenes_cluster_mode
where the displayed kmer distance of hits in the fasta description is not the actual minimum, rather the kmer distance at the index where the first window above the threshold is displayed instead. Also slightly improved the runtime of the homology searchers.
v0.5.2
Fixed "accuracy destroying" bug in the homology searching functions introduced in v0.5.1 with the kmer distance, where if the first and last kmers were equivalent, the kmer distance would be erroneously increased. Also introduced an experimental (not ready for release) and non-user-friendly homology searcher based on strobemers.
v0.5.1
v0.5.0
Introduced a novel alternative homology searching function findGenes_cluster_mode
which sacrifices runtime for more accuracy and likely a higher number of true positive hits. The function uses the same baseline algorithmic concept as findGenes
but clusters the reference sequence set to make comparisons against.
Notable changes
- Introduced the more accurate yet slower
findGenes_cluster_mode
- Somewhat improved the runtime of the main
findGenes
function - Made more types to be able to be passed to
exactMatch
- Slightly adjusted documentation and improved unittesting, though both are still incomplete
v0.4.1
v0.4.0
The performance of the primary homology searching findGenes
function has been improved about 6-fold. More features for accuracy have been added, including the use of a scaled kmer distance metric, and alignment of hits to the consensus of the reference sequence set.
Notable changes
- All kmer counting is bit-based, free of hashing, improving performance 6-fold to about 40 megabases/second
- Optional(but highly recommended) semi-global alignment of hits was implemented for improved accuracy
- Implemented brute-force sequence consensus to align hits to the average reference sequence
- Improved (but its still imperfect) Kmer distance estimation
v0.3.0
Added new homology searching algorithm that runs much faster with the assumption that there are no N nucleotides and bounded to the findGenes
function arguments.
Notable changes
- Introduced very fast version of the GMA the assumes no "N" nucleotides into the API
findgenes()
in the new fileGMA_Nless.jl
. - Slightly updated documentation, though still incomplete
- Handled cases when buffer exceeded start or end while getting sequence matches
- Every iteration in the record resets the KFV in place instead of copying
v0.2.0
Re-done kmer distance and threshold finding.
Notable changes:
- made the API able to take in a series of strings which are fasta file locations
- introduced scaleFactor (1/2k) into GMA
- introduced eucGMA mode
- introduced random threshold generator
- removed testGMA and testfindGenes
- update readme
v0.1.1
Documentation is now functional and deployed at https://qile0317.github.io/KmerGMA.jl/dev/ though still incomplete.
v0.1.0
Initial release. The barebones kmer-based gene-matching algorithm is implemented correctly which is the initial purpose of publishing the package. However, docs are incomplete. The pre-print for the algorithm will be up before the next few releases are made.
Thanks to @murrellb for massive guidance with this project.