Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of knn traversal #357

Merged
merged 2 commits into from
Aug 14, 2020
Merged

Conversation

aprokop
Copy link
Contributor

@aprokop aprokop commented Aug 10, 2020

No description provided.

@aprokop aprokop added the performance Something is slower than it should be label Aug 10, 2020
@aprokop
Copy link
Contributor Author

aprokop commented Aug 10, 2020

Summit (e8bdf3a vs 2f9356a)

#neighbors = 10

BM_knn_search<ArborX::BVH<Serial>>/10000/10000/10/1/0/2/manual_time_median                    +0.0518         +0.0518         50620         53241         50615         53237
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/10/1/0/2/manual_time_median                  +0.0600         +0.0600        529406        561165        529336        561099
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/10/1/0/2/manual_time_median                +0.0674         +0.0674       5639548       6019605       5639067       6019046
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/10/1/1/3/manual_time_median                    +0.0305         +0.0304         49874         51393         49871         51389
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/10/1/1/3/manual_time_median                  +0.0302         +0.0302        671484        691765        671414        691682
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/10/1/1/3/manual_time_median                +0.0256         +0.0256       9349252       9588227       9348455       9587383
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/10/0/0/2/manual_time_median                    +0.0596         +0.0596         52304         55423         52301         55418
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/10/0/0/2/manual_time_median                  +0.0736         +0.0735        561228        602507        561157        602423
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/10/0/0/2/manual_time_median                +0.0828         +0.0828       6339808       6865046       6337788       6862662
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/10/0/1/3/manual_time_median                    +0.0516         +0.0516         57236         60191         57233         60186
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/10/0/1/3/manual_time_median                  +0.0557         +0.0557        806079        850961        805976        850842
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/10/0/1/3/manual_time_median                +0.0569         +0.0569      11639045      12301496      11635387      12297295
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/10/1/0/2/manual_time_median                    +0.0312         +0.0310          1778          1833          1780          1835
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/10/1/0/2/manual_time_median                  +0.0483         +0.0463         14776         15489         14442         15111
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/10/1/0/2/manual_time_median                +0.0558         +0.0555        151008        159429        149505        157809
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/10/1/1/3/manual_time_median                    -0.0014         -0.0014          2132          2129          2135          2132
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/10/1/1/3/manual_time_median                  +0.0159         +0.0100         25076         25475         23228         23462
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/10/1/1/3/manual_time_median                +0.0187         +0.0085        369684        376615        315846        318522
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/10/0/0/2/manual_time_median                    +0.0678         +0.0676          1522          1625          1524          1627
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/10/0/0/2/manual_time_median                  +0.0924         +0.0957         14226         15541         13571         14869
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/10/0/0/2/manual_time_median                +0.0767         +0.0847        192188        206929        182582        198048
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/10/0/1/3/manual_time_median                    +0.0562         +0.0560          1658          1751          1661          1754
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/10/0/1/3/manual_time_median                  +0.0750         +0.0749         20121         21629         20121         21629
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/10/0/1/3/manual_time_median                +0.0793         +0.0796        313894        338772        312917        337820
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/10/1/0/2/manual_time_median                      +0.0081         +0.0076          1516          1528          1608          1620
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/10/1/0/2/manual_time_median                    -0.0687         -0.0711          7383          6876          7852          7294
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/10/1/0/2/manual_time_median                  -0.3313         -0.3238         46648         31194         47401         32055
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/10/1/1/3/manual_time_median                      -0.0020         -0.0023          1554          1551          1645          1641
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/10/1/1/3/manual_time_median                    -0.1901         -0.1830          9268          7507          9723          7943
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/10/1/1/3/manual_time_median                  -0.4841         -0.4790         93687         48331         94445         49210
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/10/0/0/2/manual_time_median                      +0.0058         +0.0050          1125          1132          1217          1223
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/10/0/0/2/manual_time_median                    -0.0775         -0.0722          5822          5371          6276          5823
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/10/0/0/2/manual_time_median                  -0.1611         -0.1583         70365         59033         71197         59929
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/10/0/1/3/manual_time_median                      +0.0065         +0.0055          1167          1175          1258          1265
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/10/0/1/3/manual_time_median                    -0.3226         -0.3087         11271          7635         11703          8091
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/10/0/1/3/manual_time_median                  -0.2990         -0.2969        167463        117386        168216        118276

#neighbors = 1

BM_knn_search<ArborX::BVH<Serial>>/10000/10000/1/1/0/2/manual_time_median                    +0.2957         +0.2957         11235         14557         11235         14556
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/1/1/0/2/manual_time_median                  +0.2883         +0.2883        125404        161562        125391        161543
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/1/1/0/2/manual_time_median                +0.2770         +0.2769       1447549       1848565       1447236       1847924
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/1/1/1/3/manual_time_median                    +0.1262         +0.1262         18745         21110         18744         21109
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/1/1/1/3/manual_time_median                  +0.1611         +0.1613        238875        277357        238813        277326
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/1/1/1/3/manual_time_median                +0.1886         +0.1886       3143884       3736722       3143388       3736176
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/1/0/0/2/manual_time_median                    +0.4477         +0.4476          9883         14307          9883         14307
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/1/0/0/2/manual_time_median                  +0.4808         +0.4808        115275        170702        115263        170684
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/1/0/0/2/manual_time_median                +0.5091         +0.5091       1439202       2171836       1438642       2171032
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/1/0/1/3/manual_time_median                    +0.1968         +0.1968         22330         26726         22329         26724
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/1/0/1/3/manual_time_median                  +0.2429         +0.2429        314535        390927        314501        390883
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/1/0/1/3/manual_time_median                +0.2828         +0.2828       4390934       5632630       4389404       5630589
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/1/1/0/2/manual_time_median                    +0.1069         +0.1060           749           829           751           830
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/1/1/0/2/manual_time_median                  +0.2151         +0.2150          4351          5287          4352          5288
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/1/1/0/2/manual_time_median                +0.2356         +0.2506         44983         55580         43907         54911
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/1/1/1/3/manual_time_median                    +0.0495         +0.0492          1082          1136          1084          1137
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/1/1/1/3/manual_time_median                  +0.1455         +0.1452          9737         11154          9054         10368
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/1/1/1/3/manual_time_median                +0.1950         +0.1775        127244        152054        112360        132304
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/1/0/0/2/manual_time_median                    +0.1810         +0.2706           592           699           550           699
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/1/0/0/2/manual_time_median                  +0.3418         +0.7899          4198          5633          2245          4019
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/1/0/0/2/manual_time_median                +0.3662         +0.9130         55514         75843         28111         53777
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/1/0/1/3/manual_time_median                    +0.1550         +0.1542           794           917           796           919
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/1/0/1/3/manual_time_median                  +0.2841         +0.2841          8041         10326          8043         10327
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/1/0/1/3/manual_time_median                +0.3486         +0.3528        113253        152729        112857        152677
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/1/1/0/2/manual_time_median                      -0.0050         -0.0050          1307          1300          1398          1391
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/1/1/0/2/manual_time_median                    -0.0492         -0.0461          2541          2416          2638          2516
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/1/1/0/2/manual_time_median                  -0.1355         -0.1208         12679         10961         13415         11794
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/1/1/1/3/manual_time_median                      +0.0049         +0.0032          1322          1328          1413          1418
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/1/1/1/3/manual_time_median                    -0.0898         -0.0981          2932          2669          3029          2732
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/1/1/1/3/manual_time_median                  -0.3417         -0.3237         21099         13890         21753         14712
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/1/0/0/2/manual_time_median                      +0.0507         +0.0471           860           904           950           995
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/1/0/0/2/manual_time_median                    +0.0346         +0.0335          2083          2155          2172          2245
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/1/0/0/2/manual_time_median                  +0.0646         +0.0594         11958         12730         12761         13519
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/1/0/1/3/manual_time_median                      -0.0082         -0.0083           896           888           986           977
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/1/0/1/3/manual_time_median                    -0.1810         -0.1752          3674          3009          3756          3098
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/1/0/1/3/manual_time_median                  -0.1536         -0.1504         41752         35338         42561         36162

So, it seems that for Serial it is quite important to check the nodes that one gets from stack for the distance, and not do extra work. For Cuda, though, it seems preferential to not do that.

Host runs a lot faster with (node, distance) stack, while Cuda without.
Checking the distance really matters for small values of k, like nearest
neighbor.
@aprokop
Copy link
Contributor Author

aprokop commented Aug 12, 2020

Summit results (master e8bdf3a vs branch dd0bf48):

#neighbors = 1

BM_knn_search<ArborX::BVH<Serial>>/10000/10000/1/1/0/2/manual_time_median                    -0.0516         -0.0516         11235         10655         11235         10655
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/1/1/0/2/manual_time_median                  -0.0749         -0.0749        125404        116010        125391        115999
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/1/1/0/2/manual_time_median                -0.0844         -0.0844       1447549       1325382       1447236       1325071
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/1/1/1/3/manual_time_median                    -0.0416         -0.0416         18745         17966         18744         17965
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/1/1/1/3/manual_time_median                  -0.0572         -0.0571        238875        225206        238813        225179
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/1/1/1/3/manual_time_median                -0.0686         -0.0686       3143884       2928365       3143388       2927884
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/1/0/0/2/manual_time_median                    -0.0124         -0.0124          9883          9761          9883          9761
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/1/0/0/2/manual_time_median                  -0.0182         -0.0182        115275        113178        115263        113166
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/1/0/0/2/manual_time_median                -0.0120         -0.0120       1439202       1421920       1438642       1421420
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/1/0/1/3/manual_time_median                    +0.0211         +0.0211         22330         22801         22329         22800
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/1/0/1/3/manual_time_median                  +0.0198         +0.0198        314535        320771        314501        320740
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/1/0/1/3/manual_time_median                +0.0090         +0.0090       4390934       4430421       4389404       4429010
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/1/1/0/2/manual_time_median                    +0.0177         +0.0171           749           762           751           764
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/1/1/0/2/manual_time_median                  +0.0484         +0.0483          4351          4561          4352          4563
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/1/1/0/2/manual_time_median                +0.0430         +0.0469         44983         46918         43907         45966
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/1/1/1/3/manual_time_median                    +0.0781         +0.0778          1082          1167          1084          1168
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/1/1/1/3/manual_time_median                  +0.1255         +0.1372          9737         10960          9054         10296
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/1/1/1/3/manual_time_median                +0.1593         +0.1320        127244        147515        112360        127188
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/1/0/0/2/manual_time_median                    +0.0640         +0.0286           592           630           550           566
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/1/0/0/2/manual_time_median                  +0.0843         +0.0794          4198          4552          2245          2424
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/1/0/0/2/manual_time_median                +0.0689         +0.0680         55514         59338         28111         30022
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/1/0/1/3/manual_time_median                    +0.0642         +0.0634           794           845           796           847
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/1/0/1/3/manual_time_median                  +0.0895         +0.0895          8041          8761          8043          8762
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/1/0/1/3/manual_time_median                +0.0741         +0.0716        113253        121643        112857        120934
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/1/1/0/2/manual_time_median                      +0.0061         +0.0056          1307          1315          1398          1406
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/1/1/0/2/manual_time_median                    -0.0488         -0.0456          2541          2417          2638          2517
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/1/1/0/2/manual_time_median                  -0.1354         -0.1211         12679         10962         13415         11790
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/1/1/1/3/manual_time_median                      +0.0235         +0.0219          1322          1353          1413          1444
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/1/1/1/3/manual_time_median                    -0.1048         -0.1043          2932          2625          3029          2714
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/1/1/1/3/manual_time_median                  -0.3667         -0.3477         21099         13363         21753         14190
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/1/0/0/2/manual_time_median                      +0.0674         +0.0622           860           918           950          1009
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/1/0/0/2/manual_time_median                    +0.0167         +0.0161          2083          2117          2172          2207
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/1/0/0/2/manual_time_median                  -0.0538         -0.0519         11958         11314         12761         12099
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/1/0/1/3/manual_time_median                      +0.0131         +0.0110           896           908           986           996
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/1/0/1/3/manual_time_median                    -0.2545         -0.2488          3674          2739          3756          2822
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/1/0/1/3/manual_time_median                  -0.3675         -0.3619         41752         26407         42561         27156

#neighbors = 10

BM_knn_search<ArborX::BVH<Serial>>/10000/10000/10/1/0/2/manual_time_median                    +0.0059         +0.0059         50620         50916         50615         50912
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/10/1/0/2/manual_time_median                  -0.0010         -0.0010        529406        528874        529336        528809
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/10/1/0/2/manual_time_median                -0.0062         -0.0062       5639548       5604712       5639067       5604287
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/10/1/1/3/manual_time_median                    -0.0080         -0.0080         49874         49474         49871         49470
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/10/1/1/3/manual_time_median                  -0.0172         -0.0172        671484        659930        671414        659849
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/10/1/1/3/manual_time_median                -0.0319         -0.0319       9349252       9050951       9348455       9050208
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/10/0/0/2/manual_time_median                    +0.0036         +0.0036         52304         52494         52301         52490
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/10/0/0/2/manual_time_median                  +0.0009         +0.0009        561228        561732        561157        561679
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/10/0/0/2/manual_time_median                -0.0016         -0.0016       6339808       6329461       6337788       6327540
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/10/0/1/3/manual_time_median                    +0.0024         +0.0024         57236         57374         57233         57370
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/10/0/1/3/manual_time_median                  -0.0045         -0.0045        806079        802450        805976        802356
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/10/0/1/3/manual_time_median                -0.0158         -0.0158      11639045      11455350      11635387      11452029
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/10/1/0/2/manual_time_median                    +0.0392         +0.0389          1778          1848          1780          1849
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/10/1/0/2/manual_time_median                  +0.0546         +0.0522         14776         15582         14442         15197
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/10/1/0/2/manual_time_median                +0.0535         +0.0543        151008        159082        149505        157628
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/10/1/1/3/manual_time_median                    +0.0477         +0.0477          2132          2234          2135          2236
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/10/1/1/3/manual_time_median                  +0.0774         +0.0871         25076         27016         23228         25251
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/10/1/1/3/manual_time_median                +0.0956         +0.0837        369684        405036        315846        342274
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/10/0/0/2/manual_time_median                    +0.0503         +0.0500          1522          1598          1524          1600
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/10/0/0/2/manual_time_median                  +0.0600         +0.0566         14226         15079         13571         14339
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/10/0/0/2/manual_time_median                +0.0476         +0.0476        192188        201337        182582        191275
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/10/0/1/3/manual_time_median                    +0.0346         +0.0341          1658          1716          1661          1717
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/10/0/1/3/manual_time_median                  +0.0414         +0.0410         20121         20954         20121         20945
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/10/0/1/3/manual_time_median                +0.0363         +0.0367        313894        325300        312917        324398
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/10/1/0/2/manual_time_median                      +0.0091         +0.0074          1516          1529          1608          1619
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/10/1/0/2/manual_time_median                    -0.1332         -0.1253          7383          6400          7852          6868
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/10/1/0/2/manual_time_median                  -0.4237         -0.4143         46648         26882         47401         27763
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/10/1/1/3/manual_time_median                      +0.0199         +0.0181          1554          1585          1645          1675
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/10/1/1/3/manual_time_median                    -0.2093         -0.2115          9268          7329          9723          7667
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/10/1/1/3/manual_time_median                  -0.5321         -0.5271         93687         43834         94445         44664
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/10/0/0/2/manual_time_median                      +0.0276         +0.0245          1125          1156          1217          1246
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/10/0/0/2/manual_time_median                    -0.1942         -0.1874          5822          4691          6276          5100
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/10/0/0/2/manual_time_median                  -0.2315         -0.2283         70365         54078         71197         54944
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/10/0/1/3/manual_time_median                      +0.0347         +0.0315          1167          1208          1258          1297
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/10/0/1/3/manual_time_median                    -0.4028         -0.3848         11271          6731         11703          7199
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/10/0/1/3/manual_time_median                  -0.4287         -0.4260        167463         95665        168216         96552

#neighbors = 100

BM_knn_search<ArborX::BVH<Serial>>/10000/10000/100/1/0/2/manual_time_median                    -0.0006         -0.0006        356627        356411        356589        356361
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/100/1/0/2/manual_time_median                  -0.0031         -0.0031       3757737       3746050       3757576       3745863
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/100/1/0/2/manual_time_median                -0.0049         -0.0049      38728319      38539060      38726102      38536810
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/100/1/1/3/manual_time_median                    -0.0187         -0.0187        301408        295780        301377        295749
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/100/1/1/3/manual_time_median                  -0.0312         -0.0312       4356795       4220693       4356573       4220468
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/100/1/1/3/manual_time_median                -0.0462         -0.0462      60861448      58052605      60856906      58048262
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/100/0/0/2/manual_time_median                    -0.0246         -0.0246        369638        360538        369599        360492
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/100/0/0/2/manual_time_median                  -0.0260         -0.0260       3945035       3842584       3944722       3842272
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/100/0/0/2/manual_time_median                -0.0280         -0.0280      42196294      41015276      42186534      41006127
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/100/0/1/3/manual_time_median                    -0.0282         -0.0282        320154        311117        320121        311083
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/100/0/1/3/manual_time_median                  -0.0389         -0.0389       4609493       4430412       4609111       4430026
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/100/0/1/3/manual_time_median                -0.0475         -0.0475      65430564      62323095      65416361      62309656
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/100/1/0/2/manual_time_median                    +0.0109         +0.0071          9709          9814          9113          9177
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/100/1/0/2/manual_time_median                  +0.0083         +0.0050         97368         98180         95532         96010
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/100/1/0/2/manual_time_median                +0.0112         +0.0058        975441        986374        971838        977508
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/100/1/1/3/manual_time_median                    +0.0262         +0.0262          8885          9118          8887          9119
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/100/1/1/3/manual_time_median                  +0.0153         +0.0197        145534        147766        138768        141505
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/100/1/1/3/manual_time_median                +0.0182         +0.0139       2259096       2300210       1955230       1982362
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/100/0/0/2/manual_time_median                    +0.0110         +0.0110          9130          9231          9132          9232
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/100/0/0/2/manual_time_median                  +0.0071         +0.0065         96477         97158         95954         96579
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/100/0/0/2/manual_time_median                -0.0028         -0.0048       1164462       1161242       1158552       1153025
BM_knn_search<ArborX::BVH<OpenMP>>/10000/10000/100/0/1/3/manual_time_median                    +0.0118         +0.0133          7982          8077          7918          8023
BM_knn_search<ArborX::BVH<OpenMP>>/100000/100000/100/0/1/3/manual_time_median                  +0.0105         +0.0104        113949        115148        113430        114611
BM_knn_search<ArborX::BVH<OpenMP>>/1000000/1000000/100/0/1/3/manual_time_median                +0.0007         +0.0021       1727118       1728253       1716255       1719935
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/100/1/0/2/manual_time_median                      -0.0392         -0.0357          5786          5559          6257          6034
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/100/1/0/2/manual_time_median                    -0.1378         -0.1370        117534        101333        118032        101861
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/100/1/0/2/manual_time_median                  -0.1901         -0.1905       1269480       1028125       1270491       1028509
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/100/1/1/3/manual_time_median                      -0.0318         -0.0319          5668          5487          6139          5943
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/100/1/1/3/manual_time_median                    -0.1215         -0.1209        151830        133387        152327        133914
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/100/1/1/3/manual_time_median                  -0.1671         -0.1668       2410539       2007835       2411386       2009245
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/100/0/0/2/manual_time_median                      -0.0259         -0.0249          5696          5549          6169          6016
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/100/0/0/2/manual_time_median                    -0.1311         -0.1305        125090        108696        125621        109230
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/100/0/0/2/manual_time_median                  -0.1521         -0.1519       1404296       1190732       1405656       1192068
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/100/0/1/3/manual_time_median                      -0.0652         -0.0552          5713          5341          6152          5812
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/100/0/1/3/manual_time_median                    -0.1748         -0.1743        160875        132755        161398        133269
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/100/0/1/3/manual_time_median                  -0.2597         -0.2596       2567694       1900892       2569064       1902250

@aprokop aprokop marked this pull request as ready for review August 12, 2020 16:23
@aprokop
Copy link
Contributor Author

aprokop commented Aug 12, 2020

OK, I'm reasonably happy with this PR now. Cuda is much faster, even for k = 1. Serial is about the same. The only downside is a slight increase for OpenMP for low k.

An interesting observation: POWER9 showed no difference, but my Intel workstation significantly improved. We should add non-POWER9 performance testing to our toolbox (Intel or AMD)performance. Maybe ALCF?

scramjet bvh_driver (knn_stack) $ for j in 10000 100000 1000000; do for i in ArborX_{master,knn}; do ./$i --benchmark_filter=knn.*Serial --values=$j --queries=$j --neighbors=1 2>/dev/null | grep knn; done;done
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/1/1/0/0/manual_time       4432 us         4432 us          153
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/1/1/0/0/manual_time       4371 us         4372 us          160
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/1/1/0/0/manual_time      54129 us        54130 us           13
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/1/1/0/0/manual_time      55425 us        55426 us           12
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/1/1/0/0/manual_time     791331 us       793000 us            1
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/1/1/0/0/manual_time     792710 us       794321 us            1
scramjet bvh_driver (knn_stack) $ for j in 10000 100000 1000000; do for i in ArborX_{master,knn}; do ./$i --benchmark_filter=knn.*Serial --values=$j --queries=$j --neighbors=5 2>/dev/null | grep knn; done;done
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/5/1/0/0/manual_time      15926 us        15927 us           44
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/5/1/0/0/manual_time      10552 us        10553 us           66
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/5/1/0/0/manual_time     190554 us       190556 us            4                                                                                                        BM_knn_search<ArborX::BVH<Serial>>/100000/100000/5/1/0/0/manual_time     118262 us       118261 us            6
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/5/1/0/0/manual_time    2280090 us      2281292 us            1                                                                                         
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/5/1/0/0/manual_time    1609104 us      1610312 us            1
scramjet bvh_driver (knn_stack) $ for j in 10000 100000 1000000; do for i in ArborX_{master,knn}; do ./$i --benchmark_filter=knn.*Serial --values=$j --queries=$j --neighbors=10 2>/dev/null | grep knn; done;done
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/10/1/0/0/manual_time      26511 us        26512 us           26
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/10/1/0/0/manual_time      19418 us        19419 us           36                                                                                                         
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/10/1/0/0/manual_time     298613 us       298611 us            2
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/10/1/0/0/manual_time     208581 us       208583 us            3
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/10/1/0/0/manual_time    3597824 us      3600194 us            1
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/10/1/0/0/manual_time    2564745 us      2567147 us            1
scramjet bvh_driver (knn_stack) $ for j in 10000 100000 1000000; do for i in ArborX_{master,knn}; do ./$i --benchmark_filter=knn.*Serial --values=$j --queries=$j --neighbors=100 2>/dev/null | grep knn; done;done
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/100/1/0/0/manual_time     195456 us       195457 us            3
BM_knn_search<ArborX::BVH<Serial>>/10000/10000/100/1/0/0/manual_time     124134 us       124134 us            6
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/100/1/0/0/manual_time    2331719 us      2333736 us            1
BM_knn_search<ArborX::BVH<Serial>>/100000/100000/100/1/0/0/manual_time    1518161 us      1520016 us            1
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/100/1/0/0/manual_time   25050864 us     25076146 us            1
BM_knn_search<ArborX::BVH<Serial>>/1000000/1000000/100/1/0/0/manual_time   17300768 us     17325995 us            1

@aprokop aprokop changed the title Slightly modify the knn traversal Improve performance of knn traversal Aug 12, 2020
Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity. Did you try to play with the alignment of the underlying C-array before you went ahead and split the stack?

@aprokop
Copy link
Contributor Author

aprokop commented Aug 13, 2020

Just out of curiosity. Did you try to play with the alignment of the underlying C-array before you went ahead and split the stack?

No. You think that 4-byte alignment could be worse than 8-byte?

@dalg24
Copy link
Contributor

dalg24 commented Aug 13, 2020

I meant

alignas(16) PairNodePtrDistance stack[64];

@aprokop
Copy link
Contributor Author

aprokop commented Aug 13, 2020

alignas(16) PairNodePtrDistance stack[64];

I see. I thought you meant adding padding to PairNodePtrDistance. But I still fail to see the reasoning behind your thinking that could help. The struct itself is 8-byte aligned.

In any case, there is a lot of tinkering that could be done in this PR. However, as any tinkering would require rerunning many performance tests, I would like to keep it to minimum, and only for strictly necessary things.

src/details/ArborX_DetailsTreeTraversal.hpp Show resolved Hide resolved
src/details/ArborX_DetailsTreeTraversal.hpp Show resolved Hide resolved
heap.popPush(leaf_pair);
if ((int)heap.size() == k)
radius = heap.top().second;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication of code is unfortunate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was really annoyed by the duplication. The only way to clean it up would be to to move part of the logic inside heap itself, but it's another can of worms.

@aprokop aprokop merged commit 9a8ff40 into arborx:master Aug 14, 2020
@aprokop aprokop deleted the knn_stack branch August 14, 2020 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Something is slower than it should be
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants