You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We present a multithreaded implementation of SortingCollection.
The main idea of this improvements is that data is being sorted and spilled in the sorting collection asynchronously by submitting tasks to the executor service.
Compared to the current implementation it looks as follows:
As we can see above, parallel version enables continuous sorting work without breaks for writing to the temporary file.
Method doneAdding() is blocking, since we have to wait until all spill tasks are completed.
What is the effect?
Below are results of the SortingCollection benchmark. It is based on the random generation of data for SortingCollection, we just create random int, transform it to string and put into the sorting collection.
Here is log output of SortSam metric for both SortingCollection implementattions:
AsyncSpillSortingCollection with 2 spilling threads:
INFO 2017-06-14 15:58:47 SortSam Read 10,000,000 records. Elapsed time: 00:00:22s. Time for last 10,000,000: 22s. Last read position: 1:148,346,706
INFO 2017-06-14 15:59:14 SortSam Read 20,000,000 records. Elapsed time: 00:00:49s. Time for last 10,000,000: 26s. Last read position: X:149,840,001
INFO 2017-06-14 15:59:41 SortSam Read 30,000,000 records. Elapsed time: 00:01:16s. Time for last 10,000,000: 27s. Last read position: 12:31,820,558
INFO 2017-06-14 16:00:09 SortSam Read 40,000,000 records. Elapsed time: 00:01:44s. Time for last 10,000,000: 27s. Last read position: 11:3,746,487
INFO 2017-06-14 16:00:36 SortSam Read 50,000,000 records. Elapsed time: 00:02:11s. Time for last 10,000,000: 27s. Last read position: 17:27,904,169
INFO 2017-06-14 16:00:47 SortSam Finished reading inputs, merging and writing to output now.
Total elapsed time = 129 sec - performance gain is 72%
Standard SortingCollection:
INFO 2017-06-14 16:05:36 SortSam Read 10,000,000 records. Elapsed time: 00:00:40s. Time for last 10,000,000: 40s. Last read position: 1:148,346,706
INFO 2017-06-14 16:06:20 SortSam Read 20,000,000 records. Elapsed time: 00:01:25s. Time for last 10,000,000: 44s. Last read position: X:149,840,001
INFO 2017-06-14 16:07:05 SortSam Read 30,000,000 records. Elapsed time: 00:02:10s. Time for last 10,000,000: 45s. Last read position: 12:31,820,558
INFO 2017-06-14 16:07:51 SortSam Read 40,000,000 records. Elapsed time: 00:02:55s. Time for last 10,000,000: 45s. Last read position: 11:3,746,487
INFO 2017-06-14 16:08:40 SortSam Read 50,000,000 records. Elapsed time: 00:03:44s. Time for last 10,000,000: 48s. Last read position: 17:27,904,169
INFO 2017-06-14 16:08:58 SortSam Finished reading inputs, merging and writing to output now.
Total elapsed time = 222 sec
The parallel processing is turned ON by setting JVM option -Dsamjdk.sort_col_threads=number_of_threads, if number_of_threads > 0, any metric/utility that uses SortingCollection will use the parallel implementation.
@SilinPavel Thanks for doing this. I'll do a full review later but wanted to share some thoughts up front:
I'm surprised at the gains you see in SortSam. I looked into this a while back and it seemed to me that so much time was spent in the GC with all those objects in memory that parallelizing wouldn't help. In fact, I'm concerned that the speedup you observe has more to do with the shrinking of the number of records in RAM (I see it's set to MAX / (threads+1)) than the parallelism. In my own testing just now, simply running the existing SortSam with MAX_RECORDS_IN_RAM=166666 vs. MAX_RECORDS_IN_RAM=500000 gave me a performance gain almost identical to that you see in SortSam. Your test shows 129/222 = 0.581, mine on 20m reads was 82/129 = 0.635.
I'm also concerned about the up-tick in the number of files used with this approach. I.e. if I'm sorting a file with 1bn records and set MAX_RECORDS_IN_RAM=1000000, I'd expect it to use 1000 files, which is within most systems limit for open filehandles, but if set to use two threads it'll create 3000 files which will cause problems for many people.
I've been thinking about parallelism in htsjdk in general and am keen to introduce it in a way that can easily be re-used across multiple pieces of the code, e.g. BAM block compression. As a user I think what we want is to be able to set the number of threads available for all parallel work, not per object using parallelism. E.g. in Picard's IlluminaBasecallsToSam there are multiple sorting collections being used in parallel - it would be great to allow say 8 threads used it total regardless of the total number of sorting collections. To do this I was thinking that we would create a singleton object in htsjdk that holds a single ExecutorService and has methods to allow users to set/override the ExecutorService in use. The service could default to a subclass of AbstractExecutorService that performs synchronous execution, so that the default behaviour is effectively serial. Then if the number of threads is set to > 0 it could be substituted with a ForkJoinPool or similar. This would obviously change what you've done a fair bit, including e.g. renaming the parameter/Defaults entry to be something like htsjdk.parallel_threads.
If (3) were done, I think the new AsyncSortingCollection could be reduced to a fairly small set of changes in SortingCollection instead of a new class.
Tests are going to be very important for this. I would suggest that SortingCollectionTest would need a) to have all test cases run for serial and parallel and b) new tests added to ensure that failures in sorting/writing are propagated correctly in parallel mode.
@tfenne thanks for your response.
Here are our answers:
Performance gain is not a surprise for us, since:
A. We witness it in one of our customers' pipelines running in AWS, were we are evaluating this updated SortingCollection and actually this pull request has a clear goal A - to satisfy that customer and let him get the important results faster.
B. the proposed changes lead to distribution of work between two independent process: outside work with SortingCollection (f.i. add()) and inside processes such as sorting of a buffer and writing it to disk.
So performance gain seems quite natural. Below are some tests that should prove it. Also we are attaching GC jvm logs and we can see that in our case GC activity is not so high and affects on performance not so much as you mentioned before.
java -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=dumponexit=true,filename=sc.jfr -jar picard-2.7.1-87-g7a85c95-SNAPSHOT-all.jar SortSam I=NA12878.mapped.ILLUMINA.bwa.CEU.exome.20121211.bam O=sort_sc.bam SORT_ORDER=queryname VALIDATION_STRINGENCY=SILENT
INFO 2017-06-26 16:58:24 SortSam Read 10,000,000 records. Elapsed time: 00:00:49s. Time for last 10,000,000: 49s. Last read position: 1:119,650,743
INFO 2017-06-26 16:59:14 SortSam Read 20,000,000 records. Elapsed time: 00:01:39s. Time for last 10,000,000: 49s. Last read position: 1:248,374,319
INFO 2017-06-26 17:00:07 SortSam Read 30,000,000 records. Elapsed time: 00:02:32s. Time for last 10,000,000: 53s. Last read position: 2:154,501,231
INFO 2017-06-26 17:01:00 SortSam Read 40,000,000 records. Elapsed time: 00:03:25s. Time for last 10,000,000: 52s. Last read position: 3:49,897,096
INFO 2017-06-26 17:01:49 SortSam Read 50,000,000 records. Elapsed time: 00:04:14s. Time for last 10,000,000: 49s. Last read position: 4:9,702,042
INFO 2017-06-26 17:02:39 SortSam Read 60,000,000 records. Elapsed time: 00:05:04s. Time for last 10,000,000: 49s. Last read position: 5:9,019,531
INFO 2017-06-26 17:03:28 SortSam Read 70,000,000 records. Elapsed time: 00:05:53s. Time for last 10,000,000: 49s. Last read position: 5:176,089,346
INFO 2017-06-26 17:04:18 SortSam Read 80,000,000 records. Elapsed time: 00:06:43s. Time for last 10,000,000: 49s. Last read position: 6:143,265,238
INFO 2017-06-26 17:05:07 SortSam Read 90,000,000 records. Elapsed time: 00:07:32s. Time for last 10,000,000: 49s. Last read position: 7:127,343,317
INFO 2017-06-26 17:05:56 SortSam Read 100,000,000 records. Elapsed time: 00:08:22s. Time for last 10,000,000: 49s. Last read position: 8:133,732,360
INFO 2017-06-26 17:06:46 SortSam Read 110,000,000 records. Elapsed time: 00:09:11s. Time for last 10,000,000: 49s. Last read position: 10:3,118,522
INFO 2017-06-26 17:07:35 SortSam Read 120,000,000 records. Elapsed time: 00:10:01s. Time for last 10,000,000: 49s. Last read position: 11:9,164,230
INFO 2017-06-26 17:08:24 SortSam Read 130,000,000 records. Elapsed time: 00:10:50s. Time for last 10,000,000: 48s. Last read position: 12:6,800,342
INFO 2017-06-26 17:09:14 SortSam Read 140,000,000 records. Elapsed time: 00:11:39s. Time for last 10,000,000: 49s. Last read position: 13:25,887,091
INFO 2017-06-26 17:10:03 SortSam Read 150,000,000 records. Elapsed time: 00:12:28s. Time for last 10,000,000: 49s. Last read position: 14:94,527,360
INFO 2017-06-26 17:10:52 SortSam Read 160,000,000 records. Elapsed time: 00:13:18s. Time for last 10,000,000: 49s. Last read position: 16:20,823,053
INFO 2017-06-26 17:11:42 SortSam Read 170,000,000 records. Elapsed time: 00:14:07s. Time for last 10,000,000: 49s. Last read position: 17:40,003,660
INFO 2017-06-26 17:12:31 SortSam Read 180,000,000 records. Elapsed time: 00:14:56s. Time for last 10,000,000: 49s. Last read position: 19:11,138,565
INFO 2017-06-26 17:13:20 SortSam Read 190,000,000 records. Elapsed time: 00:15:46s. Time for last 10,000,000: 49s. Last read position: 20:57,487,759
INFO 2017-06-26 17:14:10 SortSam Read 200,000,000 records. Elapsed time: 00:16:35s. Time for last 10,000,000: 49s. Last read position: X:57,813,555
INFO 2017-06-26 17:15:01 SortSam Read 210,000,000 records. Elapsed time: 00:17:26s. Time for last 10,000,000: 51s. Last read position: hs37d5:19,940,511
INFO 2017-06-26 17:15:14 SortSam Finished reading inputs, merging and writing to output now.
Elapsed time: 1059 secs
Value
Young Collection Total Time
Old Collection Total Time
All Collections Total Time
GC Count
237
2
239
Average GC Time
45 ms
115 ms
46 ms
Maximum GC Time
194 ms
128 ms
194 ms
Total GC Time
10 s
230 ms
11 s
java -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=dumponexit=true,filename=sc_less_max+record.jfr -jar picard-2.7.1-87-g7a85c95-SNAPSHOT-all.jar SortSam I=NA12878.mapped.ILLUMINA.bwa.CEU.exome.20121211.bam O=sort_sc.bam SORT_ORDER=queryname VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=166666
INFO 2017-06-26 19:38:23 SortSam Read 10,000,000 records. Elapsed time: 00:00:49s. Time for last 10,000,000: 49s. Last read position: 1:119,650,743
INFO 2017-06-26 19:39:14 SortSam Read 20,000,000 records. Elapsed time: 00:01:40s. Time for last 10,000,000: 50s. Last read position: 1:248,374,319
INFO 2017-06-26 19:40:03 SortSam Read 30,000,000 records. Elapsed time: 00:02:29s. Time for last 10,000,000: 48s. Last read position: 2:154,501,231
INFO 2017-06-26 19:40:52 SortSam Read 40,000,000 records. Elapsed time: 00:03:17s. Time for last 10,000,000: 48s. Last read position: 3:49,897,096
INFO 2017-06-26 19:41:41 SortSam Read 50,000,000 records. Elapsed time: 00:04:06s. Time for last 10,000,000: 49s. Last read position: 4:9,702,042
INFO 2017-06-26 19:42:30 SortSam Read 60,000,000 records. Elapsed time: 00:04:55s. Time for last 10,000,000: 48s. Last read position: 5:9,019,531
INFO 2017-06-26 19:43:18 SortSam Read 70,000,000 records. Elapsed time: 00:05:44s. Time for last 10,000,000: 48s. Last read position: 5:176,089,346
INFO 2017-06-26 19:44:07 SortSam Read 80,000,000 records. Elapsed time: 00:06:33s. Time for last 10,000,000: 48s. Last read position: 6:143,265,238
INFO 2017-06-26 19:44:55 SortSam Read 90,000,000 records. Elapsed time: 00:07:21s. Time for last 10,000,000: 48s. Last read position: 7:127,343,317
INFO 2017-06-26 19:45:44 SortSam Read 100,000,000 records. Elapsed time: 00:08:10s. Time for last 10,000,000: 48s. Last read position: 8:133,732,360
INFO 2017-06-26 19:46:32 SortSam Read 110,000,000 records. Elapsed time: 00:08:58s. Time for last 10,000,000: 47s. Last read position: 10:3,118,522
INFO 2017-06-26 19:47:20 SortSam Read 120,000,000 records. Elapsed time: 00:09:46s. Time for last 10,000,000: 48s. Last read position: 11:9,164,230
INFO 2017-06-26 19:48:08 SortSam Read 130,000,000 records. Elapsed time: 00:10:34s. Time for last 10,000,000: 48s. Last read position: 12:6,800,342
INFO 2017-06-26 19:48:56 SortSam Read 140,000,000 records. Elapsed time: 00:11:22s. Time for last 10,000,000: 48s. Last read position: 13:25,887,091
INFO 2017-06-26 19:49:45 SortSam Read 150,000,000 records. Elapsed time: 00:12:11s. Time for last 10,000,000: 48s. Last read position: 14:94,527,360
INFO 2017-06-26 19:50:33 SortSam Read 160,000,000 records. Elapsed time: 00:12:59s. Time for last 10,000,000: 48s. Last read position: 16:20,823,053
INFO 2017-06-26 19:51:21 SortSam Read 170,000,000 records. Elapsed time: 00:13:47s. Time for last 10,000,000: 48s. Last read position: 17:40,003,660
INFO 2017-06-26 19:52:09 SortSam Read 180,000,000 records. Elapsed time: 00:14:35s. Time for last 10,000,000: 48s. Last read position: 19:11,138,565
INFO 2017-06-26 19:52:57 SortSam Read 190,000,000 records. Elapsed time: 00:15:22s. Time for last 10,000,000: 47s. Last read position: 20:57,487,759
INFO 2017-06-26 19:53:45 SortSam Read 200,000,000 records. Elapsed time: 00:16:11s. Time for last 10,000,000: 48s. Last read position: X:57,813,555
INFO 2017-06-26 19:54:35 SortSam Read 210,000,000 records. Elapsed time: 00:17:00s. Time for last 10,000,000: 49s. Last read position: hs37d5:19,940,511
INFO 2017-06-26 19:54:47 SortSam Finished reading inputs, merging and writing to output now.
Elapsed time: 1032 secs
Value
Young Collection Total Time
Old Collection Total Time
All Collections Total Time
GC Count
538
1
539
Average GC Time
15 ms
64 ms
15 ms
Maximum GC Time
55 ms
55 ms
64 ms
Total GC Time
8 s
64 ms
8 s
java -Dsamjdk.sort_col_threads=2 -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=dumponexit=true,filename=awsc.jfr -jar picard-2.7.1-104-g4cf8f75-SNAPSHOT-all.jar SortSam I=NA12878.mapped.ILLUMINA.bwa.CEU.exome.20121211.bam O=sort_sc.bam SORT_ORDER=queryname VALIDATION_STRINGENCY=SILENT
INFO 2017-06-26 17:56:07 SortSam Read 10,000,000 records. Elapsed time: 00:00:27s. Time for last 10,000,000: 27s. Last read position: 1:119,650,743
INFO 2017-06-26 17:56:33 SortSam Read 20,000,000 records. Elapsed time: 00:00:52s. Time for last 10,000,000: 25s. Last read position: 1:248,374,319
INFO 2017-06-26 17:56:58 SortSam Read 30,000,000 records. Elapsed time: 00:01:17s. Time for last 10,000,000: 25s. Last read position: 2:154,501,231
INFO 2017-06-26 17:57:22 SortSam Read 40,000,000 records. Elapsed time: 00:01:41s. Time for last 10,000,000: 24s. Last read position: 3:49,897,096
INFO 2017-06-26 17:57:47 SortSam Read 50,000,000 records. Elapsed time: 00:02:06s. Time for last 10,000,000: 24s. Last read position: 4:9,702,042
INFO 2017-06-26 17:58:12 SortSam Read 60,000,000 records. Elapsed time: 00:02:31s. Time for last 10,000,000: 25s. Last read position: 5:9,019,531
INFO 2017-06-26 17:58:37 SortSam Read 70,000,000 records. Elapsed time: 00:02:56s. Time for last 10,000,000: 24s. Last read position: 5:176,089,346
INFO 2017-06-26 17:59:02 SortSam Read 80,000,000 records. Elapsed time: 00:03:22s. Time for last 10,000,000: 25s. Last read position: 6:143,265,238
INFO 2017-06-26 17:59:27 SortSam Read 90,000,000 records. Elapsed time: 00:03:46s. Time for last 10,000,000: 24s. Last read position: 7:127,343,317
INFO 2017-06-26 17:59:52 SortSam Read 100,000,000 records. Elapsed time: 00:04:11s. Time for last 10,000,000: 25s. Last read position: 8:133,732,360
INFO 2017-06-26 18:00:17 SortSam Read 110,000,000 records. Elapsed time: 00:04:36s. Time for last 10,000,000: 25s. Last read position: 10:3,118,522
INFO 2017-06-26 18:00:41 SortSam Read 120,000,000 records. Elapsed time: 00:05:00s. Time for last 10,000,000: 24s. Last read position: 11:9,164,230
INFO 2017-06-26 18:01:06 SortSam Read 130,000,000 records. Elapsed time: 00:05:25s. Time for last 10,000,000: 24s. Last read position: 12:6,800,342
INFO 2017-06-26 18:01:30 SortSam Read 140,000,000 records. Elapsed time: 00:05:49s. Time for last 10,000,000: 24s. Last read position: 13:25,887,091
INFO 2017-06-26 18:01:54 SortSam Read 150,000,000 records. Elapsed time: 00:06:13s. Time for last 10,000,000: 23s. Last read position: 14:94,527,360
INFO 2017-06-26 18:02:18 SortSam Read 160,000,000 records. Elapsed time: 00:06:38s. Time for last 10,000,000: 24s. Last read position: 16:20,823,053
INFO 2017-06-26 18:02:43 SortSam Read 170,000,000 records. Elapsed time: 00:07:02s. Time for last 10,000,000: 24s. Last read position: 17:40,003,660
INFO 2017-06-26 18:03:07 SortSam Read 180,000,000 records. Elapsed time: 00:07:26s. Time for last 10,000,000: 23s. Last read position: 19:11,138,565
INFO 2017-06-26 18:03:31 SortSam Read 190,000,000 records. Elapsed time: 00:07:50s. Time for last 10,000,000: 23s. Last read position: 20:57,487,759
INFO 2017-06-26 18:03:55 SortSam Read 200,000,000 records. Elapsed time: 00:08:14s. Time for last 10,000,000: 24s. Last read position: X:57,813,555
INFO 2017-06-26 18:04:20 SortSam Read 210,000,000 records. Elapsed time: 00:08:39s. Time for last 10,000,000: 24s. Last read position: hs37d5:19,940,511
INFO 2017-06-26 18:04:26 SortSam Finished reading inputs, merging and writing to output now.
Elapsed time: 525 secs
Value
Young Collection Total Time
Old Collection Total Time
All Collections Total Time
GC Count
204
3
207
Average GC Time
39 ms
145 ms
41 ms
Maximum GC Time
94 ms
205 ms
205 ms
Total GC Time
8 s
437 ms
8 s
Here is the file we used for our benchmarks: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA12878/exome_alignment/NA12878.mapped.ILLUMINA.bwa.CEU.exome.20121211.bam
You are welcome to try it for you own benchmarks. Also could you provide information about your environment ?
Here is our configuration: Ubuntu 16.04, i5-3570K, 3.40 GHz, 16 GB RAM, HDD
(what would you say about running some tests on a shared AWS instance to exclude the environment differences?)
We agree that this is potentially a problem (though we haven't met it in our tests), but it is potentially a problem for any SortingCollection implementation (our implementation only multiplies the number of files by 3).
And as soon the proposed changes should be turned ON explicitly, this side effect won't be a surprise. We'll think about extending information in the javaDoc, and maybe in the readme.
We have also thought and discussed such approaches a lot; as we see it, there is no ideal solution.
For example, we foresee a problem with serial executor service: htsjdk already has some parallel feature implementation, AbstractAsyncWriter as example and approach this class uses can't be implemented by serial way, and all cases with using synchronization point such as BlockingQueue, Semaphore an so on will be restricted because it can't be implemented in serial way.
We are fully ready to discuss a unified approach to parallelism in htsjdk, however we hope that this discussion won't block useful custom solutions (which could be rewritten once we get a unified approach).
Agree, sorry about missing it, there is some test for AsyncWriteSortingCollection: 21eb42c
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We present a multithreaded implementation of SortingCollection.
The main idea of this improvements is that data is being sorted and spilled in the sorting collection asynchronously by submitting tasks to the executor service.
Compared to the current implementation it looks as follows:
As we can see above, parallel version enables continuous sorting work without breaks for writing to the temporary file.
Method doneAdding() is blocking, since we have to wait until all spill tasks are completed.
What is the effect?
Below are results of the SortingCollection benchmark. It is based on the random generation of data for SortingCollection, we just create random int, transform it to string and put into the sorting collection.
Here is log output of SortSam metric for both SortingCollection implementattions:
The parallel processing is turned ON by setting JVM option -Dsamjdk.sort_col_threads=number_of_threads, if number_of_threads > 0, any metric/utility that uses SortingCollection will use the parallel implementation.