-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid huge index jumps in RandomSample
. Remove io
dependency from 2d
.
#2141
Avoid huge index jumps in RandomSample
. Remove io
dependency from 2d
.
#2141
Conversation
Thanks 👍 Although it seems like a minimal fix, I'll need to check the original paper before to make sure we're not imposing some statistical bias of some sort. |
Just had a look at things and I think I figured out what's the problem. I think the issue is here
Try to modify it to simply quot = static_cast<float> (top) / static_cast<float> (N); and report back the results please. You'll get the added bonus of it becoming To fully prove that that was the issue I would still like to verify the mean and std deviation of the difference of consecutive filtered indices, not the removed ones, before and after the correction. If you could output that from your test case and report the results it would be awesome. In the best scenario the mean will be |
Hi Sergio,
I'm afraid the proposed change makes the problem even worse: If V is very small (e.g. 0.00001), the while loop will only stop when "top" (number of remaining points) is also close to zero. The first result I posted might even be valid from a statistical viewpoint, but my common sense tells me that it's not a valid sample. |
I made a smaller example, picking 1000 from 25344 points and logging the index increment (S+1). Here are the numbers:
My proposed fix prevents extreme runaways of the "max" value and therefore also lowers "mean" and "std" somewhat. |
First of all thanks for all the trouble.
You're right. I completely misinterpreted the paper's "pseudo factorial/combinatorial symbol" as an index. Your visual results clearly show it.
Your results are definitely encouraging, but the change you propose completely breaks one assumption from the paper which is plastered all around: the uniform variable needs to be generated between 0 and 1, and this is the main reason I haven't accepted your proposal. I started to think what exactly is the problem which makes things fail specifically in your case and I think it is the sequential nature of your point cloud, both in terms of indexing and spatially distribution. This means that when there are bigger index jumps you get visible holes. This "algorithm A" was implemented due to it's speed advantage, since it only needs to instantiate My proposal to your problem is that we actually implement "algorithm S" as an alternative. This one is potentially slower (probably not that much), but if a user has special requirements like you have, this can still save your day. It is slower, less prone to statistical problems and it needs to be explicitly selected by the user. The first thing now is to actually verify that the "Algorithm S" solves your problems. |
Hi Sergio, I have tried Algorithm S and it provides a satisfying result in the same run time. Here's the code I have written:
And here are the numbers for comparison:
My proposal would be to replace Algorithm A with Algorithm S, because I consider using the current implementation of Algorithm A as "dangerous". |
Looks like Algorithm S is the way to go then. I only started to be able to clock noticeable differences between the methods when dealing with tenths of million points, still in the 10ms order though. Let's go for it 👍 |
I have now comitted the code snippet described above to random_sample.hpp and all tests have passed. Please note that there is also a commit in this PR which has nothing to do with the issue (1c9259f), I missed to create a new branch for that. |
{ | ||
// Step 4: [Don't select.] Skip over the next record (do not include it in the sample), set N : = N - 1, and return to Step 1. | ||
++index; | ||
--N; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of U, you always decrement N and you always increment index. I suggest you move these operations outside of the if/else block. Basically the else is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about an empty "else" which is removed by the compiler for better readability?
while (n > 0)
{
// Step 1: [Generate U.] Generate a random variate U that is uniformly distributed between 0 and 1.
const float U = unifRand ();
// Step 2: [Test.] If N * U > n, go to Step 4.
if ((N * U) <= n)
{
// Step 3: [Select.] Select the next record in the file for the sample, and set n : = n - 1.
if (extract_removed_indices_)
added[index] = true;
indices[i++] = (*indices_)[index];
--n;
}
else
{
// Step 4: [Don't select.] Skip over the next record (do not include it in the sample).
}
// Set N : = N - 1.
--N;
++index;
// If n > 0, then return to Step 1; otherwise, the sample is complete and the algorithm terminates.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't include it as actual code, but I'm ok with a commented version of the else in order to visualize the "step 4" in context.
It's already a separate commit so for me is totally fine. |
@@ -92,7 +92,7 @@ template<typename PointT> | |||
void | |||
pcl::RandomSample<PointT>::applyFilter (std::vector<int> &indices) | |||
{ | |||
unsigned N = static_cast<unsigned> (indices_->size ()); | |||
unsigned int N = static_cast<unsigned> (indices_->size ()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional comment. Everything which is gonna be used as an index needs to be of size_t
type. Please apply that change all over this method.
Edit: As an index or as a maximum value for an index, aka just replace all your unsigned ints for size_t. We started having some "surprises" recently with indexes overflowing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this makes also some static_cast obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final step, can you squash the three commits which affect random_sample.hpp only? Basically leave the "dead line removal" and "visualization dependency removal" untouched.
Thanks for your help and time. It was really cool to visualize the results and the corresponding "quality numbers".
I'm afraid I don't know how to do this, I have no experience with git and used the web interface for editing. |
Try all this first in a copy of your fork to prevent accidents, until you're comfortable with the outcome. |
If you try to do an algorithms-only build of PCL (without the "io" and "visualization" module), you will stumble accross this dependency which shouldn't exist in my opinion. The "2d" module contains only a handful of file which implement morphological filters and does not include any files from the "io" module.
Implemented "Algorithm S" from http://www.ittc.ku.edu/~jsv/Papers/Vit84.sampling.pdf to fix #2141
RandomSample
RandomSample
RandomSample
. Remove io
dependency from 2d
.
I have noticed that using pcl::RandomSample to reduce a cloud of 125.000 points to a cloud of 50.000 will sometimes result in a large gap of several centimeters in x-direction in my Lidar data.
The problem is located in random_sample.hpp:
When V is close to 0, the index will make a huge jump (e.g. > 20.000), resulting in the gap we see in the attached screenshot.
To fix this, I have set a lower limit of 0.01 for the random number V and made good experiences with that.