[registration] Add OMP based Multi-threading to SampleConsensusPrerejective #4433

koide3 · 2020-10-05T02:28:38Z

This PR adds OMP-based multi-threading capability to SampleConsensusPrerejective.

Changes are as follows:

const qualifiers are added to several methods to clarify that they are thread-safe
getFitness() is modified such that it uses per-thread transformation variable
similar_features are precomputed for concurrency

kunaltyagi · 2020-10-05T04:50:17Z

@koide3 Could you please tag the PR appropriately?

PS: Failures on Mac

koide3 · 2020-10-05T05:11:52Z

@kunaltyagi It seems I don't have permission to edit tags...
I'll take a look at the errors on Mac later.

kunaltyagi

Most of the changes LGTM. Haven't reviewed the code deeply

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp

registration/include/pcl/registration/sample_consensus_prerejective.h

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp

koide3 · 2020-11-03T13:26:09Z

I did rebase and squash to resolve the conflict.

kunaltyagi · 2020-11-03T16:29:17Z

@SunBlack could you please take a look at the OMP pragma sites?

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp

SunBlack · 2020-11-04T16:47:44Z

@SunBlack could you please take a look at the OMP pragma sites?

Took a while until both compilers build the branch.

@kunaltyagi After removing Ubuntu 16.04 from Azure: do we still have a GCC 6-8 anywhere, so we can still be sure to have OPENMP_LEGACY_CONST_DATA_SHARING_RULE where necessary?

@koide3 Are you are sure about the usage of firstprivate? A copy of the vectors is made for each thread.

kunaltyagi · 2020-11-04T21:36:17Z

do we still have a GCC 6-8 anywhere?

Yes, 18.04 has GCC 7 on it (though we force install GCC 8 on it for now due to ... reasons)

koide3 · 2020-11-05T08:52:05Z

@SunBlack

Are you are sure about the usage of firstprivate? A copy of the vectors is made for each thread.

Yes, one copy of the them must be created for each thread.

registration/include/pcl/registration/sample_consensus_prerejective.h

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp

mvieth · 2020-11-17T19:50:43Z

Do you have an estimate which parts of computeTransform take the most time? I am asking to determine whether it really makes sense to parallelize both for loops like you did, or if perhaps only the parallelization of the first loop is good because the second loop is fast anyway and the overhead of the parallelization is too large - just a thought.
Also it would be interesting to see how large the speedup actually is, e.g. if you run the openmp version with 2 or 4 threads, how much faster is it than running it with one thread? That could give hints whether the openmp implementation still has opportunities for optimization.

koide3 · 2020-11-18T02:40:10Z

Here is a brief benchmark result. The second for loop, which is the main loop of RANSAC, is the most expensive part in this algorithm, while the first for loop is not so heavy. It seems the overhead of omp is not so large in this case, and both the loops get much faster with omp.

15950 pts vs 15772 pts (FPFH)

with OMP
threads   1st[msec] 2nd[msec]  total[msec]
1         261.082   22999.672  23260.755
2         118.717   12271.585  12390.303
3          79.286    8156.884   8236.170
4          63.650    7475.746   7539.397
5          53.186    5391.872   5445.058
6          44.846    4408.987   4453.833
7          43.878    3903.800   3947.678
8          36.810    4014.925   4051.735
9          33.360    3960.413   3993.773

without OMP
1          263.646  23792.844  24056.490
1          226.914  24037.704  24264.618
1          226.266  22705.109  22931.375
1          221.195  25475.866  25697.061

mvieth

Your benchmark looks really promising!
Do you think it would make sense to add a unit test, e.g. to copy the one in test_sac_ia.cpp and run it again with 4 threads?

registration/include/pcl/registration/sample_consensus_prerejective.h

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp

koide3 · 2020-11-19T08:24:17Z

Can I do rebase and force-push to resolve the conflicts?

JStech

Looks good--just a couple minor things

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp

registration/include/pcl/registration/sample_consensus_prerejective.h

koide3 · 2020-11-24T06:08:40Z

Conflicts are resolved.

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp

kunaltyagi · 2020-11-24T11:46:33Z

@JStech @mvieth Please resolve the conversations that have reached conclusion. It becomes a bit difficult to see if there's work left

Loads of updates

JStech · 2020-11-25T15:17:09Z

Somehow my comment threads don't have "Resolve" buttons--maybe because I made them comments and not change requests in the first place? I do consider them all resolved.

mvieth

Sorry for the long delay, the changes look good.
Only the windows CIs are failing, but I seem to be unable to rerun them, probably because the runs are too old? Perhaps if you close and reopen this PR?

Hopefully this fixes compile error on windows: "only a variable or static data member can be used in a data-sharing clause"

mvieth · 2022-08-23T13:11:50Z

I checked again and the drawing of the random samples (selectSamples) should not be done in parallel (calls to rand()). That has to either be wrapped in omp critical, or maybe a better approach would be to make the for-loop in getFitness parallel and keep the main for-loop sequential. That would be safer, but I haven't checked yet how that would compare in terms of speed-up.

larshg · 2022-08-24T04:34:37Z

I checked again and the drawing of the random samples (selectSamples) should not be done in parallel (calls to rand()). That has to either be wrapped in omp critical, or maybe a better approach would be to make the for-loop in getFitness parallel and keep the main for-loop sequential. That would be safer, but I haven't checked yet how that would compare in terms of speed-up.

Ahh, yes that seems to cause problems. Atleast its not as random as it should be and according to some reading, can lead to UB, when calling rand() in multithreaded manner.

larshg

Look into calling rand() in multithreaded.

larshg · 2022-08-24T06:29:27Z

I guess the estimateRigidTransformation is also somewhat compute intensive, so the better option is to guard the random sampling?

larshg · 2022-08-24T06:35:00Z

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp

  PointCloudSource input_transformed;
  input_transformed.resize(input_->size());
-  transformPointCloud(*input_, input_transformed, final_transformation_);
+  transformPointCloud(*input_, input_transformed, transformation);

  // For each point in the source dataset


nn_indices and nn_dists could be moved outside the for loop to reduce (de)allocations.

mvieth · 2022-08-24T18:23:01Z

I guess the estimateRigidTransformation is also somewhat compute intensive, so the better option is to guard the random sampling?

I can't confirm that: In my tests, less than 1% of the time of computeTransformation was spent in estimateRigidTransformation. Almost all of the time was spent in getFitness and finding similar features. Another reason to not parallelize the main loop: currently, the loop runs exactly max_iterations_ times, but it could make sense to stop earlier if a good solution has been found. With a parallel main loop, that would be more difficult to implement.
I will check how well getFitness can be parallelized.

larshg · 2022-08-24T18:55:35Z

Yeah okay, that doesn't sound like much. Lets see what your tests shows 👍

kunaltyagi reviewed Oct 5, 2020

View reviewed changes

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp Outdated Show resolved Hide resolved

kunaltyagi reviewed Oct 12, 2020

View reviewed changes

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp Outdated Show resolved Hide resolved

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp Outdated Show resolved Hide resolved

kunaltyagi added changelog: ABI break Meta-information for changelog generation changelog: deprecation Meta-information for changelog generation module: registration needs: more work Specify why not closed/merged yet labels Oct 12, 2020

kunaltyagi requested a review from mvieth October 12, 2020 10:26

kunaltyagi added needs: code review Specify why not closed/merged yet and removed needs: more work Specify why not closed/merged yet labels Oct 12, 2020

kunaltyagi previously approved these changes Oct 12, 2020

View reviewed changes

koide3 mentioned this pull request Oct 13, 2020

[registration] Input size check for Registration::setInputSource #4441

Merged

koide3 force-pushed the multi-threaded-ransac branch from 2030a81 to 365fefe Compare November 3, 2020 13:25

SunBlack suggested changes Nov 4, 2020

View reviewed changes

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp Outdated Show resolved Hide resolved

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp Outdated Show resolved Hide resolved

mvieth reviewed Nov 14, 2020

View reviewed changes

mvieth reviewed Nov 18, 2020

View reviewed changes

registration/include/pcl/registration/sample_consensus_prerejective.h Outdated Show resolved Hide resolved

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp Show resolved Hide resolved

kunaltyagi self-requested a review November 19, 2020 08:03

JStech reviewed Nov 20, 2020

View reviewed changes

koide3 added 2 commits November 24, 2020 14:54

Make SampleConsensusPrerejective multi-threaded

307a796

Add default(none) to omp pragma

8677af1

koide3 added 5 commits November 24, 2020 14:57

Use PCL_WARN instead of PCL_DEBUG and do some updates for efficiency

178eda5

Modify getFitness() so it returns fitness_score

92fd5ec

Change num_rejections's omp attribute

25d7866

Update doc and use unsigned int for num_threads_

5626fd7

Apply clang-format to sample_consensus_prerejective.h

0e622ad

koide3 force-pushed the multi-threaded-ransac branch from 9f34405 to 0e622ad Compare November 24, 2020 06:08

kunaltyagi reviewed Nov 24, 2020

View reviewed changes

registration/include/pcl/registration/impl/sample_consensus_prerejective.hpp Outdated Show resolved Hide resolved

koide3 added 2 commits November 24, 2020 16:26

Apply clang-format to sample_consensus_prerejective.hpp

36d0c33

Add const to some variables

8e0cbd1

kunaltyagi self-requested a review November 24, 2020 11:47

mvieth previously approved these changes Feb 3, 2021

View reviewed changes

kunaltyagi added the needs: author reply Specify why not closed/merged yet label May 26, 2021

Merge branch 'master' into multi-threaded-ransac

0567a89

mvieth dismissed their stale review via 0567a89 August 20, 2022 12:37

mvieth added 2 commits August 20, 2022 20:20

Remove k_correspondences_ from data-sharing clause

0b902c6

Hopefully this fixes compile error on windows: "only a variable or static data member can be used in a data-sharing clause"

Fix formatting

0eb0d4e

larshg approved these changes Aug 23, 2022

View reviewed changes

larshg requested changes Aug 24, 2022

View reviewed changes

larshg reviewed Aug 24, 2022

View reviewed changes

Uh oh!

[registration] Add OMP based Multi-threading to SampleConsensusPrerejective #4433

Are you sure you want to change the base?

[registration] Add OMP based Multi-threading to SampleConsensusPrerejective #4433

Conversation

koide3 commented Oct 5, 2020

Uh oh!

kunaltyagi commented Oct 5, 2020

Uh oh!

koide3 commented Oct 5, 2020

Uh oh!

kunaltyagi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

koide3 commented Nov 3, 2020

Uh oh!

kunaltyagi commented Nov 3, 2020

Uh oh!

Uh oh!

Uh oh!

SunBlack commented Nov 4, 2020 • edited by kunaltyagi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kunaltyagi commented Nov 4, 2020

Uh oh!

koide3 commented Nov 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mvieth commented Nov 17, 2020

Uh oh!

koide3 commented Nov 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mvieth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

koide3 commented Nov 19, 2020

Uh oh!

JStech left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

koide3 commented Nov 24, 2020

Uh oh!

Uh oh!

kunaltyagi commented Nov 24, 2020

Uh oh!

JStech commented Nov 25, 2020

Uh oh!

mvieth left a comment

Choose a reason for hiding this comment

Uh oh!

mvieth commented Aug 23, 2022

Uh oh!

larshg commented Aug 24, 2022

Uh oh!

larshg left a comment

Choose a reason for hiding this comment

Uh oh!

larshg commented Aug 24, 2022

Uh oh!

larshg Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

SunBlack commented Nov 4, 2020 •

edited by kunaltyagi

Loading

koide3 commented Nov 5, 2020 •

edited

Loading

koide3 commented Nov 18, 2020 •

edited

Loading