Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Add bounds for haplotype matrix #61

Merged
merged 2 commits into from
Dec 22, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
perf: Add bounds for haplotype matrix
We know that the upper bound for the value of any cell in the occurence
matrix has to be equal to or less than the number of reads. Leverage that
knowledge to shrink the size of arrays that are needed.
  • Loading branch information
MillironX committed Dec 22, 2023
commit 4928e714c9a99a3bb67c87669ae207b47ddf641e
11 changes: 10 additions & 1 deletion src/haplotypecalling.jl
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,16 @@ dimensional matrix.
function occurrence_matrix(
haplotype::AbstractArray{Variation{S,T}}, reads::AbstractArray{Haplotype{S,T}}
) where {S<:BioSequence,T<:BioSymbol}
hapcounts = SparseArray{UInt}(undef, Tuple(repeat([2], length(haplotype))))
Q = UInt
for int_type in [UInt8, UInt16, UInt32, UInt64, UInt128]
if length(reads) < typemax(int_type)
Q = int_type
break
end #if
error("Too many reads to represent in memory")
end #for

hapcounts = SparseArray{Q}(undef, Tuple(repeat([2], length(haplotype))))

for read in reads
coordinates = zeros(Int, size(haplotype))
Expand Down