-
Notifications
You must be signed in to change notification settings - Fork 5
Description
I have indexed a quite large barcoded BAM (~220 Gb) file using LRez and now I want to perform queries for barcodes. I have several lists of barcodes with about 2000 entries in each. Unfortunately it is very slow. If I read the paper correctly queries of about 1000 barcodes took at most 10 min. For me it has been running for almost 3 hours with files of 2000 queries without finishing.
Commands
# Index
LRez index bam- b file.bam -o file.bam.bci -f -t 10
# Query
LRez query bam -b file.bam -i file.bam.bci -l list.bxu -o list.bam -t 10 -H
Below is the memory/CPU usage. I am run two query commands in parallel with 10 threads each.
I guess the initial sharp memory-incline is from loading the index (size about 55Gb on disk), this seams to take about 10 min or so. Then it is presumably doing index lookups for the list of barcodes which is taking much longer that I would expect. Any idea why this is so slow?
As a side-note it seems that core utilisation is quite poor with only about 1 core per process being used.
