Avoid huge index jumps in `RandomSample`. Remove `io` dependency from `2d`. #2141

ThorstenHarter · 2017-12-11T08:42:58Z

I have noticed that using pcl::RandomSample to reduce a cloud of 125.000 points to a cloud of 50.000 will sometimes result in a large gap of several centimeters in x-direction in my Lidar data.

The problem is located in random_sample.hpp:

  float V = unifRand ();
  unsigned S = 0;
  float quot = static_cast<float> (top) / static_cast<float> (N);
  while (quot > V)
  {
    S++;
    top--;
    N--;
    quot = quot * static_cast<float> (top) / static_cast<float> (N);
  }
  index += S;

When V is close to 0, the index will make a huge jump (e.g. > 20.000), resulting in the gap we see in the attached screenshot.
To fix this, I have set a lower limit of 0.01 for the random number V and made good experiences with that.

SergioRAgostinho · 2017-12-11T13:21:23Z

Thanks 👍

Although it seems like a minimal fix, I'll need to check the original paper before to make sure we're not imposing some statistical bias of some sort.

SergioRAgostinho · 2017-12-12T02:12:56Z

Just had a look at things and I think I figured out what's the problem. I think the issue is here

pcl/filters/include/pcl/filters/impl/random_sample.hpp

Line 132 in 6d646d1

quot = quot * static_cast<float> (top) / static_cast<float> (N);

Try to modify it to simply

quot = static_cast<float> (top) / static_cast<float> (N);

and report back the results please.

You'll get the added bonus of it becoming ~~in average three times~~ faster ;)

To fully prove that that was the issue I would still like to verify the mean and std deviation of the difference of consecutive filtered indices, not the removed ones, before and after the correction. If you could output that from your test case and report the results it would be awesome.

In the best scenario the mean will be ~~equal~~ close to N / sample_size and the std as close to 0 as possible.

ThorstenHarter · 2017-12-12T08:17:58Z

Hi Sergio,
the original line matches the pseudo code in the paper:

quot := quot x top/N

I'm afraid the proposed change makes the problem even worse:

If V is very small (e.g. 0.00001), the while loop will only stop when "top" (number of remaining points) is also close to zero.

The first result I posted might even be valid from a statistical viewpoint, but my common sense tells me that it's not a valid sample.
So what I did is a visual inspection of a range of different sample sizes in the 3D viewer.
I'll see if I can get some numbers in a spreadsheet.

ThorstenHarter · 2017-12-12T09:00:28Z

I made a smaller example, picking 1000 from 25344 points and logging the index increment (S+1).

Here are the numbers:

value	master	branch
min	1	1
max	258	130
median	18	18
mean	25.258	25.253
std	25.853	24.619

My proposed fix prevents extreme runaways of the "max" value and therefore also lowers "mean" and "std" somewhat.
In my opinion this is desirable.

SergioRAgostinho · 2017-12-12T22:08:31Z

First of all thanks for all the trouble.

the original line matches the pseudo code in the paper

You're right. I completely misinterpreted the paper's "pseudo factorial/combinatorial symbol" as an index. Your visual results clearly show it.

Here are the numbers:

value master branch

min 1 1

max 258 130

median 18 18

mean 25.258 25.253

std 25.853 24.619

Your results are definitely encouraging, but the change you propose completely breaks one assumption from the paper which is plastered all around: the uniform variable needs to be generated between 0 and 1, and this is the main reason I haven't accepted your proposal.

I started to think what exactly is the problem which makes things fail specifically in your case and I think it is the sequential nature of your point cloud, both in terms of indexing and spatially distribution. This means that when there are bigger index jumps you get visible holes.

This "algorithm A" was implemented due to it's speed advantage, since it only needs to instantiate n times the random number generator vs the ~N. Statistically it ensures an approximation to the uniform distribution and in most point clouds it will work just fine, but if there's an organization such as yours, it can produce very bad results.

My proposal to your problem is that we actually implement "algorithm S" as an alternative. This one is potentially slower (probably not that much), but if a user has special requirements like you have, this can still save your day. It is slower, less prone to statistical problems and it needs to be explicitly selected by the user.

The first thing now is to actually verify that the "Algorithm S" solves your problems.

ThorstenHarter · 2017-12-15T12:49:59Z

Hi Sergio,
this sequential indexing is a feature which all sensor generated point clouds that I work with share, because they are created either by a 2D line scan plus relative movement or by a 2D time-of-flight sensor.

I have tried Algorithm S and it provides a satisfying result in the same run time.

Here's the code I have written:

// Algorithm S
unsigned int i = 0;
unsigned int index = 0;
std::vector<bool> added;
if (extract_removed_indices_)
  added.resize (indices_->size (), false);
size_t n = sample_size;
while (n > 0)
{
  // Step 1: [Generate U.] Generate a random variate U that is uniformly distributed between 0 and 1.
  const float U = unifRand ();
  // Step 2: [Test.] If N * U > n, go to Step 4. 
  if ((N * U) <= n)
  {
    // Step 3: [Select.] Select the next record in the file for the sample, and set n : = n - 1 and N : = N - 1.
    // If n > 0, then return to Step 1; otherwise, the sample is complete and the algorithm terminates.
    if (extract_removed_indices_)
      added[index] = true;
    indices[i++] = (*indices_)[index++];
    --n;
    --N;
  }
  else
  {
    // Step 4: [Don't select.] Skip over the next record (do not include it in the sample), set N : = N - 1, and return to Step 1.
    ++index;
    --N;
  }
}

And here are the numbers for comparison:

value	Algo A	Algo A*	Algo S
min	1	1	1
max	258	130	181
median	18	18	18
mean	25.258	25.253	25.322
std	25.853	24.619	24.128

My proposal would be to replace Algorithm A with Algorithm S, because I consider using the current implementation of Algorithm A as "dangerous".

SergioRAgostinho · 2017-12-19T02:58:48Z

Looks like Algorithm S is the way to go then. I only started to be able to clock noticeable differences between the methods when dealing with tenths of million points, still in the 10ms order though.

Let's go for it 👍

ThorstenHarter · 2017-12-19T11:38:12Z

I have now comitted the code snippet described above to random_sample.hpp and all tests have passed.

Please note that there is also a commit in this PR which has nothing to do with the issue (1c9259f), I missed to create a new branch for that.
Should I revert the edit?

SergioRAgostinho · 2017-12-19T13:55:58Z

filters/include/pcl/filters/impl/random_sample.hpp

+      {
+        // Step 4: [Don't select.] Skip over the next record (do not include it in the sample), set N : = N - 1, and return to Step 1.
+        ++index;
+        --N;


Regardless of U, you always decrement N and you always increment index. I suggest you move these operations outside of the if/else block. Basically the else is unnecessary.

How about an empty "else" which is removed by the compiler for better readability?

while (n > 0) { // Step 1: [Generate U.] Generate a random variate U that is uniformly distributed between 0 and 1. const float U = unifRand (); // Step 2: [Test.] If N * U > n, go to Step 4. if ((N * U) <= n) { // Step 3: [Select.] Select the next record in the file for the sample, and set n : = n - 1. if (extract_removed_indices_) added[index] = true; indices[i++] = (*indices_)[index]; --n; } else { // Step 4: [Don't select.] Skip over the next record (do not include it in the sample). } // Set N : = N - 1. --N; ++index; // If n > 0, then return to Step 1; otherwise, the sample is complete and the algorithm terminates. }

I wouldn't include it as actual code, but I'm ok with a commented version of the else in order to visualize the "step 4" in context.

SergioRAgostinho · 2017-12-19T14:00:08Z

Please note that there is also a commit in this PR which has nothing to do with the issue (1c9259f), I missed to create a new branch for that.
Should I revert the edit?

It's already a separate commit so for me is totally fine.

SergioRAgostinho · 2017-12-19T14:03:26Z

filters/include/pcl/filters/impl/random_sample.hpp

@@ -92,7 +92,7 @@ template<typename PointT>
 void
 pcl::RandomSample<PointT>::applyFilter (std::vector<int> &indices)
 {
-  unsigned N = static_cast<unsigned> (indices_->size ());
+  unsigned int N = static_cast<unsigned> (indices_->size ());


One additional comment. Everything which is gonna be used as an index needs to be of size_t type. Please apply that change all over this method.

Edit: As an index or as a maximum value for an index, aka just replace all your unsigned ints for size_t. We started having some "surprises" recently with indexes overflowing.

Good point, this makes also some static_cast obsolete.

SergioRAgostinho

Final step, can you squash the three commits which affect random_sample.hpp only? Basically leave the "dead line removal" and "visualization dependency removal" untouched.

Thanks for your help and time. It was really cool to visualize the results and the corresponding "quality numbers".

ThorstenHarter · 2017-12-19T15:45:20Z

I'm afraid I don't know how to do this, I have no experience with git and used the web interface for editing.
I have also GitHub Desktop installed, but I assume you still need the console for advanced operations...
Could you please give me some hints?

SergioRAgostinho · 2017-12-19T15:53:26Z

Have a read through this.
Invoke the git rebase in interactive mode providing the last commit before you started applying changes.
During the interactive rebase, reorder things so that your random_sample.hpp related commits are all sequential one after the other
Select squash on the latest 2 and all 3 will be condensed into a single one.
When you push to your repo specify the option -f/--force

Try all this first in a copy of your fork to prevent accidents, until you're comfortable with the outcome.

If you try to do an algorithms-only build of PCL (without the "io" and "visualization" module), you will stumble accross this dependency which shouldn't exist in my opinion. The "2d" module contains only a handful of file which implement morphological filters and does not include any files from the "io" module.

Implemented "Algorithm S" from http://www.ittc.ku.edu/~jsv/Papers/Vit84.sampling.pdf to fix #2141

SergioRAgostinho added the needs: code review Specify why not closed/merged yet label Dec 11, 2017

SergioRAgostinho added needs: author reply Specify why not closed/merged yet and removed needs: code review Specify why not closed/merged yet labels Dec 12, 2017

SergioRAgostinho added needs: code review Specify why not closed/merged yet and removed needs: author reply Specify why not closed/merged yet labels Dec 12, 2017

SergioRAgostinho added needs: testing Specify why not closed/merged yet needs: author reply Specify why not closed/merged yet and removed needs: code review Specify why not closed/merged yet needs: testing Specify why not closed/merged yet labels Dec 12, 2017

SergioRAgostinho added needs: code review Specify why not closed/merged yet and removed needs: author reply Specify why not closed/merged yet labels Dec 18, 2017

SergioRAgostinho added needs: more work Specify why not closed/merged yet and removed needs: code review Specify why not closed/merged yet labels Dec 19, 2017

SergioRAgostinho reviewed Dec 19, 2017

View reviewed changes

SergioRAgostinho added needs: author reply Specify why not closed/merged yet and removed needs: more work Specify why not closed/merged yet labels Dec 19, 2017

SergioRAgostinho reviewed Dec 19, 2017

View reviewed changes

SergioRAgostinho approved these changes Dec 19, 2017

View reviewed changes

Remove dead code line

7822a62

ThorstenHarter and others added 2 commits December 19, 2017 19:40

Avoid huge index jumps in pcl::RandomSample

eb5e299

Implemented "Algorithm S" from http://www.ittc.ku.edu/~jsv/Papers/Vit84.sampling.pdf to fix #2141

SergioRAgostinho added needs: code review Specify why not closed/merged yet and removed needs: author reply Specify why not closed/merged yet needs: code review Specify why not closed/merged yet labels Dec 19, 2017

SergioRAgostinho merged commit 0a302d3 into PointCloudLibrary:master Dec 19, 2017

SergioRAgostinho mentioned this pull request Aug 6, 2018

Change Log automation #2385

Closed

SergioRAgostinho mentioned this pull request Aug 25, 2018

Change Log generation tool. Automates change log generation. #2396

Merged

SergioRAgostinho changed the title ~~Avoid huge index jumps in RandomSample~~ Avoid huge index jumps in RandomSample Aug 26, 2018

SergioRAgostinho changed the title ~~Avoid huge index jumps in RandomSample~~ Avoid huge index jumps in RandomSample. Remove io dependency from 2d. Aug 26, 2018

SergioRAgostinho added module: filters module: 2d labels Aug 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid huge index jumps in `RandomSample`. Remove `io` dependency from `2d`. #2141

Avoid huge index jumps in `RandomSample`. Remove `io` dependency from `2d`. #2141

ThorstenHarter commented Dec 11, 2017

SergioRAgostinho commented Dec 11, 2017

SergioRAgostinho commented Dec 12, 2017 •

edited

Loading

ThorstenHarter commented Dec 12, 2017

ThorstenHarter commented Dec 12, 2017

SergioRAgostinho commented Dec 12, 2017

ThorstenHarter commented Dec 15, 2017 •

edited

Loading

SergioRAgostinho commented Dec 19, 2017

ThorstenHarter commented Dec 19, 2017

SergioRAgostinho Dec 19, 2017

ThorstenHarter Dec 19, 2017

SergioRAgostinho Dec 19, 2017

SergioRAgostinho commented Dec 19, 2017

SergioRAgostinho Dec 19, 2017 •

edited

Loading

ThorstenHarter Dec 19, 2017

SergioRAgostinho left a comment

ThorstenHarter commented Dec 19, 2017

SergioRAgostinho commented Dec 19, 2017

Avoid huge index jumps in RandomSample. Remove io dependency from 2d. #2141

Avoid huge index jumps in RandomSample. Remove io dependency from 2d. #2141

Conversation

ThorstenHarter commented Dec 11, 2017

SergioRAgostinho commented Dec 11, 2017

SergioRAgostinho commented Dec 12, 2017 • edited Loading

ThorstenHarter commented Dec 12, 2017

ThorstenHarter commented Dec 12, 2017

SergioRAgostinho commented Dec 12, 2017

ThorstenHarter commented Dec 15, 2017 • edited Loading

SergioRAgostinho commented Dec 19, 2017

ThorstenHarter commented Dec 19, 2017

SergioRAgostinho Dec 19, 2017

Choose a reason for hiding this comment

ThorstenHarter Dec 19, 2017

Choose a reason for hiding this comment

SergioRAgostinho Dec 19, 2017

Choose a reason for hiding this comment

SergioRAgostinho commented Dec 19, 2017

SergioRAgostinho Dec 19, 2017 • edited Loading

Choose a reason for hiding this comment

ThorstenHarter Dec 19, 2017

Choose a reason for hiding this comment

SergioRAgostinho left a comment

Choose a reason for hiding this comment

ThorstenHarter commented Dec 19, 2017

SergioRAgostinho commented Dec 19, 2017

Avoid huge index jumps in `RandomSample`. Remove `io` dependency from `2d`. #2141

Avoid huge index jumps in `RandomSample`. Remove `io` dependency from `2d`. #2141

SergioRAgostinho commented Dec 12, 2017 •

edited

Loading

ThorstenHarter commented Dec 15, 2017 •

edited

Loading

SergioRAgostinho Dec 19, 2017 •

edited

Loading