Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCL point normals not always calculated correctly in sparse clouds #2403

Open
msy22 opened this issue Aug 29, 2018 · 65 comments
Open

PCL point normals not always calculated correctly in sparse clouds #2403

msy22 opened this issue Aug 29, 2018 · 65 comments
Labels
good first issue Skills/areas of expertise needed to tackle the issue kind: bug Type of issue kind: todo Type of issue module: features needs: pr merge Specify why not closed/merged yet

Comments

@msy22
Copy link

msy22 commented Aug 29, 2018

Your Environment

  • Operating System and version: Ubuntu 14.04
  • Compiler: GCC v4.8.4
  • PCL Version: 1.8.1
  • C++ Version: C++11

Context

I'm currently trying to find a point cloud registration method that can accurately match relatively sparse 3D point clouds from one Velodyne HDL-32E. At the moment I am using PCL's implementation of Generalized ICP to match several sequential scans similar to the one shown below.

cloud1

(This is one scan of a carpark outside an office building with floor-ceiling glass windows, hence little few LiDAR returns from the wall).

I.e. I take one scan like this and try to register it to a scan taken 1-2 cm further forward. Now, matching sequential scans does work in some areas but not in others. Since GICP depends on good normal calculations I've been inspecting some clouds in the PCL visualizer (shown in this tutorial) and I noticed that in some places the normals are not being calculated correctly at all.

Expected Behavior

The normals in most parts of the clouds should be calculated correctly. Or if they aren't, it's obviously because the data is too sparse in that region of the point cloud.

Current Behavior

The normal directions are mostly correct, but in some places where there is plenty of data, they are not calculated correctly. The most obvious place is in this example, where the normals all point more-or-less upwards in the flower bed part of the wall. This is incorrect, the normals should be orthogonal to the wall and pointing out horozontally:

normal_wall_example2

This is the case in this specific area pretty much regardless of whether I use a KNearest or Radius point search, and regardless of the values for K or for the radius.

Code to Reproduce

Try the cloud in question: carpark_cloud.ply

in the PCL visualizer (shown in this tutorial)

Possible Solution

I have two theories on what the cause of the problem is.

  1. The data itself is the problem. Because the Velodyne has an error of +/- 2cm any given scan line is not a perfect line but has a thickness, whose maximum width will be about 4 cm (assuming max +/- 2cm error). This means that each scan line actually generates it's own plane of points, and at close ranges there may be enough points packed tightly together in one line that this overrules the plane fitting method PCL uses by default and fits a plane to the scan line rather than the underlying topography. Hopefully this diagram accurately illustrates the problem:

lidar_issue

  1. There is a problem or bug with PCL's plane fitting code. This was suggested to me by a colleague who suggested that because they're all pointing upwards, it might be because that is the default and if the plane fitting functions fail, it won't produce anything different. Now, I've tracked the normal calculation code down to pcl::eigen33, where the smallest eigenvalue is found, and it's associated eigenvector is set to the nx, ny and nz normal coordiantes. But I haven't gotten any further than that. yet. This theory is the reason I'm raising the issue here and not as a general issue on the mailing list.

So my question is: Which of these two theories is likely to be correct, and are there any suggestions on what I can do to resolve it?

@SergioRAgostinho
Copy link
Member

Hey. Here's a couple of comments/ideas.

This is incorrect, the normals should be orthogonal to the wall and pointing out horozontally

The points inside the highlighted region appear to be distributed as lines. There might be a chance that during normal estimation, your neighborhood search parameters only allow it to establish a neighborhood which is approximately a line, yielding unreliable normals. Can you randomly select a point used in the region you selected and highlight the points that are being used to estimate the normals?

  1. There is a problem or bug with PCL's plane fitting code. This was suggested to me by a colleague who suggested that because they're all pointing upwards, it might be because that is the default and if the plane fitting functions fail, it won't produce anything different. Now, I've tracked the normal calculation code down to pcl::eigen33, where the smallest eigenvalue is found, and it's associated eigenvector is set to the nx, ny and nz normal coordiantes. But I haven't gotten any further than that. yet. This theory is the reason I'm raising the issue here and not as a general issue on the mailing list.

There's a long lasting known problem caused by precision issues. You can read more about it in #560 and #1407. I also believed we also pushed a couple of PRs to master which affect covariance and eigenvalue/vector computation. You should definitely try the master branch if you can, just to see if you verify a different behavior.

Give it a try and report results. :shipit:

@SergioRAgostinho SergioRAgostinho added the needs: author reply Specify why not closed/merged yet label Aug 29, 2018
@msy22
Copy link
Author

msy22 commented Aug 30, 2018

Thanks very much for the help and the suggestions Serigo! I really appreciate it.

First off, the point selection - this has been something I've worried about. So I did spend some time modifying some of my code to return the selected indices and colour them, which for this specific cloud yields:

point_selection_comparison

And the result is more or less the same, regardless of the input parameters. And in the picture with the normals shown (the one where the cloud is green instead of white) I used a search radius set to 0.3 m, which as you can see from the image just above, there should be more than enough points across multiple lines to produce an accurate normal.

@msy22
Copy link
Author

msy22 commented Aug 30, 2018

As for the second suggestion...

Issue #560 talks about the issue and at the bottom, gcasey says that PR #1407 should fix it. But it looks like that PR hasn't been merged with the main branch? And appears to be failing a build check?

It also looks like both you and Logan Byers proposed solutions to the problem in the comments of PR #1407, and that in Logan's case he found the two-pass solution wasn't robust/accurate enough for him. So I'll try his solution and see if I can get it to work...

For the record I did (re-) install PCL 1.8.1 from source (the main trunk) less than two months ago, so it is relatively new.

@SergioRAgostinho
Copy link
Member

I used a search radius set to 0.3 m, which as you can see from the image just above, there should be more than enough points across multiple lines to produce an accurate normal.

That's our baseline then and I agree that that should work to produce a decent normal.

Issue #560 talks about the issue and at the bottom, gcasey says that PR #1407 should fix it. But it looks like that PR hasn't been merged with the main branch? And appears to be failing a build check?

It also looks like both you and Logan Byers proposed solutions to the problem in the comments of PR #1407, and that in Logan's case he found the two-pass solution wasn't robust/accurate enough for him. So I'll try his solution and see if I can get it to work...

You should try to compute things manually now for a single point. The normal is the eigen vector associated with the lowest eigenvalue of the covariance matrix. The most robust version of this algorithm I can think is resorting to the following steps, performing them separately with double precision:

  • compute the centroid
  • demean the points
  • compute covariance
  • compute eigenvalues/eigenvectors
  • retain the eigenvector corresponding to the lowest eigenvalue

Try to reproduce these steps with the highest precision available and validate the results with other libraries whenever possible e.g., once you have the covariance matrix (3x3) you can pass see what numpy outputs as eigenvalues/eigenvectors vs what the library is doing. Try to validate as much steps as possible.

If you reach a stable setup for the normals then report back and we'll start digging when are things failing in the normal estimation algorithm.

@msy22
Copy link
Author

msy22 commented Aug 30, 2018

OK, so I kinda raced ahead and went for a full blown solution. I worked backwards down the chain of functions and figured out which ones are used during the normal calculation process, copied them into a modified version of normal_3d_omp.h/hpp and then just changed all the relevant instances of "float" to double. The result is REALLY hacky but it works:

corrected_normals

This looks a lot better than what I was getting before, and produces normals that look exactly as I would expect. This is using a search radius of 0.3m

@msy22
Copy link
Author

msy22 commented Aug 30, 2018

I can go through and try to validate each stage individually like you suggested, but I think the root cause is definitely the use of floats rather than doubles. So is there value in trying to verify each stage in the process or would it be better to skip straight to a solution?

@taketwo
Copy link
Member

taketwo commented Aug 31, 2018

We have functions in PCL that support both single and double precision through template parameter, e.g. planeWithPlaneIntersection. Maybe the same can be done with the functions involved in your pipeline?

@msy22
Copy link
Author

msy22 commented Aug 31, 2018

I guess that'd be the most straight-forward way to do it, then it's mostly automatic. That solution would essentially involve adding a new declaration of the important functions, at the very least the following:

  • computePointNormal
  • solvePlaneParameters

Which I assume would also require creating new definitions of the class variables covariance_matrix and xyz_centroid They're defined in normal_3d.h and I had to create double duplicates of those variables to get my hack to work.

In addition, what options would the user have to force the use of the float or double versions? The highest level code the user is going to want to use is the NormalEstimation class. I.e. if this is all the user sees:

pcl::NormalEstimationOMP<PointT, PointT> ne; 
ne.setRadiusSearch(...);
ne.setViewPoint(...);
ne.setInputCloud(...);
ne.compute(...);

How does the underlying code know to use floats or doubles? The effect this bug/issue can have is pretty dramatic (as I'll show in a post below) so I feel like it'd be important to make the user away of the difference and give them the option to force one case or the other.

@msy22
Copy link
Author

msy22 commented Aug 31, 2018

So I've been testing matching sequences of 35 clouds, and just incrementally matching each of the 35 back to the first cloud. This bug/issue has a pretty major impact on how well the clouds are matched in some environments. So in all the pictures below, the left hand image is the match BEFORE I fixed this issue, and the right hand image is AFTER fixing it with my hacked code.

In some areas it doesn't make much difference, either because the match was already pretty good or because there isn't much that can improve it:

(NOTE: The pictures are actually pretty big, so click on them for a larger version, rather than squinting at what you see below)

bugfix_carpark_rear

bugfix_carpark_front

But in other areas it is the difference between a good match and a useless one:

bugfix_carpark_diligent

bugfix_office

bugfix_stockpiles

bugfix_marylands

bugfix_zone1

So for my application, using doubles rather than floats makes a world of difference.

@taketwo
Copy link
Member

taketwo commented Aug 31, 2018

Tentatively, I'd propose the following:

  • template computeMeanAndCovarianceMatrix and solvePlaneParameters on the scalar type
  • template computePointNormal on the scalar type, at the same time making covariance_matrix_ and xyz_centroid_ local to that function
  • add a setter/getter to NormalEstimation to control desired precision
  • update computeFeature to take precision into account and use corresponding version of computePointNormal.

@SergioRAgostinho
Copy link
Member

I can go through and try to validate each stage individually like you suggested, but I think the root cause is definitely the use of floats rather than doubles. So is there value in trying to verify each stage in the process or would it be better to skip straight to a solution?

We should go straight for the solution. Your time is precious and we should use it efficiently.

How does the underlying code know to use floats or doubles? The effect this bug/issue can have is pretty dramatic (as I'll show in a post below) so I feel like it'd be important to make the user away of the difference and give them the option to force one case or the other.

The user will need to explicitly request double precision through template parameters in case it desires it. In the planeWithPlaneIntersection example Sergey linked before, that it achieved with

planeWithPlaneIntersection<double>(...)

@taketwo we should consider defaulting actually to double, since we often have reports of bad normals. The user who wishes speed then can explicitly take the risk of using single fp precision.

@taketwo
Copy link
Member

taketwo commented Aug 31, 2018

The user will need to explicitly request double precision through template parameters in case it desires it.

I'd like to clarify that I propose template parameter-based selection of precision class only for free functions. For the NormalEstimation class I think it's better to have a setter/getter such that precision can be adjusted at runtime. Reasons: we don't want to double the amount of instantiations of NormalEstimation due to added class template parameter. Further, I believe it's nice to be able to control precision class through a variable that can be changed at runtime (from command line input or config file) versus alternating code and recompiling.

we should consider defaulting actually to double

Do we understand in which cases single is okay, and in which double is needed? Does in have to do with pointcloud extents? Maybe we can have a heuristic that selects precision automatically if it was not set explicitly.

@SergioRAgostinho
Copy link
Member

Do we understand in which cases single is okay, and in which double is needed? Does in have to do with pointcloud extents?

That's the current intuition. Whenever points stray away from 0 too much, we start having problems.

Maybe we can have a heuristic that selects precision automatically if it was not set explicitly.

Hmm. I think I'd rather keep it simple for now. But to provide an informed opinion I need to have an intuition on the time penalty we're incurring for jumping into double precision. Something which reports a time metric per normal point and per the local neighborhood size.

@taketwo
Copy link
Member

taketwo commented Aug 31, 2018

That's the current intuition. Whenever points stray away from 0 too much, we start having problems.

Well, if this is the only reason, then we should just de-mean point neighborhoods before computing covariance matrix. Otherwise, what if points stray even further away from 0? Provide quadruple precision implementation?

@taketwo
Copy link
Member

taketwo commented Aug 31, 2018

@msy22 can you do us a favor and run the PCD from the linked issue through your modified code? If you get good results then we can close that issue in favor of this one.

@SergioRAgostinho
Copy link
Member

Well, if this is the only reason, then we should just de-mean point neighborhoods before computing covariance matrix. Otherwise, what if points stray even further away from 0? Provide quadruple precision implementation?

This brings in the second parallel discussion that happened in #1407, regarding a single vs double pass to compute the mean and the covariance matrix, specifically the errors precision errors incurred on the former.

@msy22 you might want to try first things with the double pass mean and cov estimation. Please into account this proposed approach.

@msy22
Copy link
Author

msy22 commented Sep 2, 2018

Ok, first things first - I've downloaded that .pcd file from issue #2372 and put it though the unmodified and modified versions of my normal estimation code (as per @taketwo's request).

I've included some quick results below. For this test I used a point search radius of 2.0 meters since the point distribution is much sparser than my clouds, and has an odd banded structure (I'm guessing this was taken using aerial LiDAR). For an indication of scale, here's a point with a 2m radius highlight in red:

xyz_2m_search_radius

Now as for the accuracy of point normals, here's a comparison. On the left is unmodified (i.e. floats) and on the right is modified (i.e. doubles). As before, click on the pictures for a bigger version.

xyz_test

However, it can still be a little hard telling how much of an effect using doubles has had, so here's the same comparison as above, but zoomed in on the roof of the building so you can see the difference in normals a bit more clearly.

xyz_roof

There are a couple of things that stand out to me about this:

  1. Yes, the unmodified (floats) normal estimation is worse than when doubles are used. However the original images of the cloud on issue #2372 look a little worse and I suspect that this is because K=10 points were used, when maybe a radius or a larger K value would have been better.
  2. The modified (doubles) result is better, but still not as good as I would have expected given the results I'm now getting with my point clouds
  3. My point clouds are offset from the coordinate system center (XYZ = 0,0,0) by only a few hundred meters in either direction, whereas that point cloud is offset by XYZ ~= -971000, -19300, 0. So I think @taketwo is correct about this:

should just de-mean point neighborhoods before computing covariance matrix. Otherwise, what if points stray even further away from 0? Provide quadruple precision implementation?

@msy22
Copy link
Author

msy22 commented Sep 3, 2018

So in short, we have three possible causes of poor normal calculations (assuming no other problems come to light):

  1. Use of floats rather than doubles
  2. Long distances from coordinate frame center
  3. Use of a single-pass calculation rather than a double-pass calculation

My results suggest that 1 is a significant contributing factor, but the results from issue #2372 suggest that resolving problem 1 may not be enough, and that problem 2 also needs to be resolved. And as @SergioRAgostinho has already pointed out, the method of doing a double-pass may also improve things.

Without knowing exactly which of the above problems (in any combination) we need to solve in order to get the best normal calculation, we can't confidently provide a proper solution. So I'll implement a de-meaning function and double-pass calculation (following @SergioRAgostinho's suggestion) then put my clouds and the xyz.pcd cloud though everything to see which combination of fixes gives the best result.

Now at this stage I don't have any method of obtaining a "ground truth" for the normals... so aside from eye-balling the images I don't have any way of precisely (in degrees or some other metric) comparing results. If either of you have any suggestions in this area, I'd be glad to hear them. Otherwise, depending on how some meetings go over the next two weeks, I may be able to convince my supervisors to let me spend more time on this problem, and I may be able to do something about obtaining a ground truth...

@taketwo
Copy link
Member

taketwo commented Sep 3, 2018

@msy22 thanks for giving it a try and also for the detailed analysis.

My results suggest that 1 is a significant contributing factor, but the results from issue #2372 suggest that resolving problem 1 may not be enough, and that problem 2 also needs to be resolved

What would be interesting to see, is whether resolving 2 alone is enough. In other words, is single precision sufficient given that the point cloud is at the origin?

If that is the case, I have an idea how we can get away even with a single-pass single precision algorithm. We don't need to de-demean the point cloud exactly, we only need to bring it close enough to the origin. In the context of normal estimation, we are typically dealing with compact groups of neighboring points, as discovered by a spatial search backend. Thus we can approximate the centroid of such a neighborhood with the query point used for search, or, equivalently, the point at which the normal is being estimated. (I would even speculate that any point in the neighborhood will be good enough). This is trivial to implement and amounts for a one additional SSE instruction per point (to subtract the "central" point), and then one more to adjust the computed centroid.

What do you think? Any reasons why this would not work?

If either of you have any suggestions in this area, I'd be glad to hear them.

I'd simply go with synthetic planar point clouds translated away from the origin. If time permits, we can add a unit test based on the real data. For example, crop the ground in one of your LIDAR point clouds, fit a plane, and test whether resulting normals are close enough to plane. By the way, #1407 includes additional unit test which can be re-used.

@msy22
Copy link
Author

msy22 commented Sep 3, 2018

Thanks @taketwo!, I'll address that shortly, although I don't have much experience with the PCL code base and virtually none with SSE, so I may be less help than you think haha.

But before I get too behind, here is the results of my testing.

Method

To de-mean my point clouds, I calculated the centroid and applied a transformation like so:

Eigen::Vector4d centroid;
Eigen::Affine3d transform = Eigen::Affine3d::Identity();
compute3DCentroid(*point_cloud_ptr, centroid);
transform.translation() << -centroid(0), -centroid(1), -centroid(2);
pcl::transformPointCloud(*point_cloud_ptr, *point_cloud_ptr, transform);

Although I've since realized there's a demeanPointCloud function in centroid.h that I could have used but never mind...

To implement the double-pass covariance calculation method, I used a slightly modified version of the code snippet @SergioRAgostinho posted:

  inline int
  computeMeanAndCovarianceMatrixDoublePass (const pcl::PointCloud<PointInT> &cloud, const std::vector<int> &indices, Eigen::Matrix3f &covariance_matrix, Eigen::Vector4f &centroid, bool double_pass)
  {
      if (double_pass)
      {
        pcl::compute3DCentroid(cloud, centroid);
        return(pcl::computeCovarianceMatrixNormalized (cloud, centroid, covariance_matrix));
      }
      else
        return(pcl::computeMeanAndCovarianceMatrix(cloud, covariance_matrix, centroid));
  }

To implement the double precision calculations I used the method I've already described (just a local copy-paste of a few functions redeclared with doubles instead of floats).

Results

Here's the visual result for the whole cloud:

xyz_normal_accuracy_comparison

And again zoomed in on the roof, which seems to be the clearest place in the cloud for inspecting normals.

xyz_roof_normal_accuracy_comparison

Conclusions

  1. To me the clear winner here is de-meaning the whole point cloud.
  2. This conclusively proves your point @taketwo that double precision is not enough by itself if the cloud is still far away from the origin.
  3. Interestingly, the double-pass calculation doesn't seem to have done much (provided I implemented it correctly) and it actually took 3-4 times the normal amount of time to calculate the normals. So I think this solution can be ruled out.

@msy22
Copy link
Author

msy22 commented Sep 3, 2018

Just to add to the above, de-meaning my point cloud solves the problem as well, the image below shows the unmodified cloud on the left and the de-meaned but still calculated using single (float) precision cloud on the right.

cloud122_compared

For reference, the centroid I subtract from this cloud is XYZ = 583.75, 433.07, 16.7171. Also note that I've been de-meaning my point clouds before putting them into the visualizer for a while now, since the viewer is bloody impossible to control well if the cloud is far from the origin. But until now I'd never de-meaned the cloud before calculating the normals.

So to answer your question @taketwo:

is single precision sufficient given that the point cloud is at the origin?

Yes, it looks like it might be. I've run my code with both de-meaning and double precision, and visually I can't tell the difference between the result and just de-meaning. So there's probably no way to tell the difference between the two (if there is any) without a mathematical proof or a metric for measuring normal accuracy.

As for whether it'd be enough to de-mean the whole cloud or each cluster individually, that's another matter. And the synthetic planar clouds is a good suggestion. It'd be easy enough to create a basic environment and the associated points using the method shown in the PCL interactive ICP tutorial. It's certainly be a nice way to more thoroughly validate everything we've discussed, and a good way to test other parameters that might affect normal calculation.

@SergioRAgostinho
Copy link
Member

If either of you have any suggestions in this area, I'd be glad to hear them.

I'd simply go with synthetic planar point clouds translated away from the origin. If time permits, we can add a unit test based on the real data. For example, crop the ground in one of your LIDAR point clouds, fit a plane, and test whether resulting normals are close enough to plane. By the way, #1407 includes additional unit test which can be re-used.

If you feel like undergoing a more challenging synthetic example, computing normals from points densely sampled from a spherical surface should also do the trick.

Thanks @taketwo!, I'll address that shortly, although I don't have much experience with the PCL code base and virtually none with SSE, so I may be less help than you think haha.

@taketwo might have abstracted you from SSE details with #2247. Nevertheless, if you hit a problem, ask for opinions/options and we'll always chime in.

As for whether it'd be enough to de-mean the whole cloud or each cluster individually, that's another matter.

I would demean on a per cluster basis. (Assuming the demeaning operation to be practically inexpensive).

On a side note: you have my praise @msy22 . Great analysis and presentation. We don't get that often so... take my kudos.

@taketwo
Copy link
Member

taketwo commented Sep 3, 2018

  1. Interestingly, the double-pass calculation doesn't seem to have done much (provided I implemented it correctly) and it actually took 3-4 times the normal amount of time to calculate the normals. So I think this solution can be ruled out.

What you have done looks correct. But the result is surprising. For the record, #1407 includes a different take on two-pass algorithm. If you have time and interest, you can give it a try.

As for whether it'd be enough to de-mean the whole cloud or each cluster individually, that's another matter.

Exact de-meaning requires two passes and we don't want that. Approximate de-meaning (as per my suggestion) yields better and better approximations the smaller the cloud is. So it's best to perform it at the neighborhood cluster level.

I don't have much experience with the PCL code base and virtually none with SSE, so I may be less help than you think haha.

I did not mean that we need to code SSE explicitly, hopefully the compiler will do the job. I just wanted to stress it'll hopefully map to a single operation, thus won't be expensive.

Specifically, in this code:

point_count = indices.size ();
for (std::vector<int>::const_iterator iIt = indices.begin (); iIt != indices.end (); ++iIt)
{
//const PointT& point = cloud[*iIt];
accu [0] += cloud[*iIt].x * cloud[*iIt].x;
accu [1] += cloud[*iIt].x * cloud[*iIt].y;
accu [2] += cloud[*iIt].x * cloud[*iIt].z;
accu [3] += cloud[*iIt].y * cloud[*iIt].y;
accu [4] += cloud[*iIt].y * cloud[*iIt].z;
accu [5] += cloud[*iIt].z * cloud[*iIt].z;
accu [6] += cloud[*iIt].x;
accu [7] += cloud[*iIt].y;
accu [8] += cloud[*iIt].z;
}

Just compute "approximately" de-meaned point:

const Eigen::Vector4f point = cloud[*iIt].getVector4fMap() - cloud[indices[0]].getVector4fMap(); 

And use it further to update accu. In the end, adjust centroid to account for this shift.

@msy22
Copy link
Author

msy22 commented Sep 3, 2018

If you feel like undergoing a more challenging synthetic example, computing normals from points densely sampled from a spherical surface should also do the trick.

That's a good suggestion. If I'm going to do this properly I should start with simple primitives like a cube, plane or sphere and work my way up through more complex objects to real data. But that will all have to wait until after this issue is resolved.

On a side note: you have my praise @msy22 . Great analysis and presentation. We don't get that often so... take my kudos.

Thanks! I appreciate that!

@taketwo might have abstracted you from SSE details with #2247. Nevertheless, if you hit a problem, ask for opinions/options and we'll always chime in.

I did not mean that we need to code SSE explicitly, hopefully the compiler will do the job. I just wanted to stress it'll hopefully map to a single operation, thus won't be expensive.

Great, thanks for the link! And I'll do my best to get it working.

What you have done looks correct. But the result is surprising. For the record, #1407 includes a different take on two-pass algorithm. If you have time and interest, you can give it a try.

That's why I was concerned that maybe I hadn't implemented it properly. I'll have a quick crack at the other method before moving on. Because the two-pass method was central to that PR, it would be good to be able to definitively rule out/in that method, otherwise it's a loose thread.

I would demean on a per cluster basis. (Assuming the demeaning operation to be practically inexpensive).

As for the subject of demeaning the whole cloud vs individual clusters, would demeaning just the cloud be faster and still provide the same benefit? I may end up editing this later today after I learn more, but my naive assumption is that if we de-mean each cluster that will result in more operations, depending on how it's implemented.

Say for the sake of the argument that we de-mean each cluster around the point we're calculating the normal for, and that we're doing a K-nearest point search where K=10. We de-mean that point and it's 10 nearest neighbors. Then we calculate the normal for the next point in the cloud which is likely just the nearest neighbor of the first point. We repeat the same process and end up demeaning basically the same cluster of points all over again. So we essentially end up de-meaning the same point multiple times in slightly different ways. My worry is that an implementation like this could result in orders of magnitude more calculations than simply de-meaning the whole cloud (i.e. each point once).

In the mean time, I'll look at the alternative double-pass calculation. Then I'll fork the PCL repo, start familiarizing myself with the necessary code, and write up a plan/to-do list which I'll post here so you can both sanity check it.

@SergioRAgostinho
Copy link
Member

As for the subject of demeaning the whole cloud vs individual clusters, would demeaning just the cloud be faster and still provide the same benefit?

Potentially not. If your entire point cloud is very uniform, the centroid is at the origin but there are still points at far away positions, you'll still end up with the same original problem. This is why "demeaning" needs to be done on a per cluster basis.

So we essentially end up de-meaning the same point multiple times in slightly different ways. My worry is that an implementation like this could result in orders of magnitude more calculations than simply de-meaning the whole cloud (i.e. each point once).

You're correct. But the underlying assumption intuition is that this operation is inexpensive. Of course we now need to benchmark the processing time to confirm this hypothesis.

@taketwo
Copy link
Member

taketwo commented Sep 4, 2018

There are many factors involved and only benchmarking can show what is faster. But here is a speculative argument why "simply de-meaning the whole cloud" approach may be slower. For the sake of the argument, let's assume that we even know the centroid in advance. Now, note that we can not de-mean the point cloud in-place, which means that a new cloud of the same size has to be allocated. Then we need to make a pass through all points (read access), perform subtraction, and write back the results (write access). And then move on and compute covariance matrices for each cluster. So the overhead of "simple de-meaning" is an allocation and a read/write access to all points. Compared to that, per-cluster de-meaning needs a single pass and no additional memory. And we all know that memory operations are waaaay slower than arithmetics, even if we are lucky with cache hits.

@msy22
Copy link
Author

msy22 commented Sep 4, 2018

So the overhead of "simple de-meaning" is an allocation and a read/write access to all points. Compared to that, per-cluster de-meaning needs a single pass and no additional memory. And we all know that memory operations are waaaay slower than arithmetics, even if we are lucky with cache hits.

Aaaah, that's really valuable to know. Ok it is less straight-forward then. (For reference, only one 3rd of my background is in software, and operation costs/code optimization is a gap in my knowledge I haven't had time to fill yet).

You're correct. But the underlying assumption intuition is that this operation is inexpensive. Of course we now need to benchmark the processing time to confirm this hypothesis.

OK, judging by PR #2247 it looks like you already have processes in place for testing the computation time of different bits of code. So then what I need to do is implement both solutions quickly and roughly so you two can then test it and prove with data which of the two options is faster.

If looks like the best place to de-mean the whole point cloud would be somewhere in NormalEstimation::computeFeature, before it starts iterating through the cloud. This means editing normal_3d.hpp and/or normal_3d_omp.hpp. As per your suggestion @taketwo, de-meaning each cluster looks like it should happen in centroid.hpp, and for now maybe just in the specific version of computeMeanAndCovarianceMatrix I know will be used. I'm not sure what you'd need for testing, but adding a boolean input to toggle on/off de-meaning would probably do for now.

So what would be best for you two? Would it be better for me to modify a fork of the PCL code base so you can clone, compile and test that? Or would it be easier and faster if I do something like what I've done to get this far - copying the relevant functions into a custom version of normal_3d_omp.h/.hpp and changing them appropriately? So all you'd need is one extra .h and one .hpp file.

@taketwo
Copy link
Member

taketwo commented Sep 4, 2018

OK, judging by PR #2247 it looks like you already have processes in place for testing the computation time of different bits of code.

Well, we have that one instance when we did extensive benchmarking. However, this is not an established process.

I have created a new project for benchmarking and testing mean/covariance computation. It's largely based on the repository from #2247. It's just a skeleton, but it should get you started. In future there is a bunch of things that would need to be verified, e.g. whether number of samples/iterations in adequate, whether we are indeed benchmarking what we want to benchmark (check assembly), make the size of neighborhoods parametrizable, etc.

@msy22
Copy link
Author

msy22 commented Jan 30, 2019

In point 27774, second column, we have a negative variance! :'D Something which is theoretically impossible.

Argh yeah that is definitely not supposed to happen. That term specifically is being generated by this line in computeMeanAndCovariance which mathematically is E[zz] - E[z]E[z]. The way it's implemented though, the E[zz] term is being calculated as a sum of zz and then divided by the number of points. My guess is that the sum of zz is becoming too big to be stored in a float, being rounded down to a significantly lower number, which when divided is just slightly a lower value than the E[z]E[z] term. I'll quickly pull this apart to see if that is indeed what's happening... but regardless I think this is solid confirmation that the co-variance matrices are plenty wrong before they get to eigen33.

EDIT: Yeah that's exactly what seems to be happening, when I compute that line with floats I get:

E[zz] term:    1000000.000000000
E[z]E[z] term: 1000000.125000000

But when I calculate it with doubles I get:

E[zz] term:    1000000.000000000
E[z]E[z] term: 1000000.000000000

Whether eigen33 is computing anything wrong by itself is another matter... Thanks for linking me to to the Eigen solver, I'll re-write my code to use that and I'll put all the raw output into a text file for you, just in case I did make a transcribing error in the image above.

@msy22
Copy link
Author

msy22 commented Jan 31, 2019

Here's a text file with the results from a dozen or so test point: covariance_testing.

And the source code I used, just in case:

Source

void
testCovariance(string cloud_filename, int idx, int k)
{
// Load one test point cloud
//cout << "Loading point clouds..." << endl;
PointCloudT::Ptr cloud_in (new PointCloudT);
cloud_in = LoadPointCloudFromFile(FILEPATH + cloud_filename);
Eigen::Vector4f n;
float curvature;
Eigen::Vector4f eVals;

// Find its k-nearest neighbours
std::vector<int> nn_indices(k);
std::vector<float> nn_dists(k);
pcl::search::KdTree<PointT>::Ptr kdtree (new pcl::search::KdTree<PointT>);
kdtree->setInputCloud(cloud_in);
kdtree->nearestKSearch(idx, k, nn_indices, nn_dists);

// Create a sub cloud containing just point i and its nearest neighbours
PointCloudT::Ptr tmp_cloud (new PointCloudT);
tmp_cloud->width = k + 1;
tmp_cloud->height = 1;
tmp_cloud->points.resize(k+1);
for (int j=0; j < nn_indices.size(); j++)
{
  tmp_cloud->points[j].x = cloud_in->points[nn_indices[j]].x;
  tmp_cloud->points[j].y = cloud_in->points[nn_indices[j]].y;
  tmp_cloud->points[j].z = cloud_in->points[nn_indices[j]].z;
}
tmp_cloud->points[k].x = cloud_in->points[idx].x;  // Also copy in the point we want the normal for at the end
tmp_cloud->points[k].y = cloud_in->points[idx].y;
tmp_cloud->points[k].z = cloud_in->points[idx].z;

// Calculate the covariance matrix and print it
cout << "\n### Testing covariance calculations of point " << idx
     << " at origin ### \n" << endl;
Eigen::Matrix3f covariance_float;  Eigen::Vector4f centroid_float;
computeMeanAndCovarianceMatrix(*tmp_cloud, covariance_float, centroid_float);
cout << "Covariance matrix: " << endl;
print3x3Matrix(covariance_float);
cout << "Eigen values: " << endl;
pcl::eigen33(covariance_float, eVals);
Eigen::SelfAdjointEigenSolver<Eigen::Matrix3f> es(covariance_float);
cout << "    eigen33     = [ " << eVals[0] << ", " << eVals[1] << ", " << eVals[2] << " ]" << endl;
cout << "    EigenSolver = [ " << es.eigenvalues()[0] << ", "
                           << es.eigenvalues()[1] << ", "
                           << es.eigenvalues()[2] << " ]" << endl;
cout << "Point Normal: " << endl;
pcl::computePointNormal(*tmp_cloud, n, curvature);
cout << "    n = [ " << n[0] << ", " << n[1] << ", " << n[2] << " ]" << endl;

// Shift cloud to be far from origin, re-calculate covariance matrix
cout << "\n### Testing covariance calculations of point " << idx
     << " at xyz=1000 (float) ### \n" << endl;
MoveCloudToCoordinate(tmp_cloud, 1000.0, 1000.0, 1000.0);
computeMeanAndCovarianceMatrix(*tmp_cloud, covariance_float, centroid_float);
cout << "Covariance matrix: " << endl;
print3x3Matrix(covariance_float);
cout << "Eigen values: " << endl;
pcl::eigen33(covariance_float, eVals);
es.compute(covariance_float);
cout << "    eigen33     = [ " << eVals[0] << ", " << eVals[1] << ", " << eVals[2] << " ]" << endl;
cout << "    EigenSolver = [ " << es.eigenvalues()[0] << ", "
                           << es.eigenvalues()[1] << ", "
                           << es.eigenvalues()[2] << " ]" << endl;
cout << "Point Normal: " << endl;
pcl::computePointNormal(*tmp_cloud, n, curvature);
cout << "    n = [ " << n[0] << ", " << n[1] << ", " << n[2] << " ]" << endl;

cout << "\n### Testing covariance calculations of point " << idx
     << " at xyz=1000 (double) ### \n" << endl;
Eigen::Matrix3d covariance_double;  Eigen::Vector4d centroid_double;
MoveCloudToCoordinate(tmp_cloud, 1000.0, 1000.0, 1000.0);
computeMeanAndCovarianceMatrix(*tmp_cloud, covariance_double, centroid_double);
cout << "Covariance matrix: " << endl;
print3x3Matrix(covariance_double);
cout << "Eigen values: " << endl;
pcl::eigen33(covariance_double, eVals);
Eigen::SelfAdjointEigenSolver<Eigen::Matrix3d> es2(covariance_double);
cout << "    eigen33     = [ " << eVals[0] << ", " << eVals[1] << ", " << eVals[2] << " ]" << endl;
cout << "    EigenSolver = [ " << es.eigenvalues()[0] << ", "
                           << es.eigenvalues()[1] << ", "
                           << es.eigenvalues()[2] << " ]" << endl;
cout << "Point Normal: " << endl;
pcl::computePointNormal(*tmp_cloud, n, curvature);
cout << "    n = [ " << n[0] << ", " << n[1] << ", " << n[2] << " ]" << endl;
}

@SergioRAgostinho
Copy link
Member

👍 The LoS theory crippling cov matrices is confirmed.

Now it's time to dig into eigen33. Thanks for the files and code snippet. I'll try to spend some hours over it during the weekend.

@msy22
Copy link
Author

msy22 commented Feb 1, 2019

The LoS theory crippling cov matrices is confirmed... Now it's time to dig into eigen33.

Agreed. Unfortunately this raises an ugly question - the results I have indicate that the normals calculated when LoS occurs are highly random. But if something is going wrong inside eigen33 as well, then there may be two major sources of error at play here, and normals affected only by error from the covariance calculations may be more deterministic that what my results currently show.

I'm going to re-write the code snippet I've got above and try to calculate the normal via eigen33 and via the EigenSolver, then report back.

@SergioRAgostinho
Copy link
Member

SergioRAgostinho commented Feb 1, 2019

At this point I believe you need to assume there's something off in eigen33. I'm not sure how that fits your publication time frame. You have two options though:

  • In this one, it involves minor work, you replace the inside of eigen33 with an invocation to Eigen's SelfAdjointSolver. This way, you'll be able to isolate the effects of LoS in the covariance matrices from the errors in eigen33. Generate all your plots again.
  • Then additionally, this one extending your current work, you can add a new section, exploring the angular error being caused by eigen33. I think the only assumption eigen33 has is that the supplied matrix is symmetric. So randomly generate a load of them and compare the computed eigen vectors between both methods. This second point is roughly what I was planning on doing this weekend.

@msy22
Copy link
Author

msy22 commented Feb 2, 2019

Yeah... I'm gonna have to re-calculate my plots. No way around that unfortunately.

So here's the new file of results: covariance_results

The output of which looks like this:

### Testing covariance calculations of point 8207 at xyz= 0.0751,0.0606,-0.00308 ### 

Covariance matrix: 
    | 0.000000108 -0.000000072 -0.000000154 | 
    | -0.000000072 0.000000362 -0.000000081 | 
    | -0.000000154 -0.000000081 0.000000337 | 
Eigen values: 
    eigen33 (1st) = [ 2.22e-09, -, - ]
    eigen33 (2nd) = [ 2.22e-09, 3.61e-07, 4.43e-07 ]
    eigen33 (3rd) = [ 2.22e-09, 3.61e-07, 4.43e-07 ]
    EigenSolver   = [ 2.22e-09, 3.61e-07, 4.43e-07 ]
Point Normal: 
    computePointNormal   = [ 0.848, 0.272, 0.455 ]
    EigenSolver          = [ -0.848, -0.272, -0.455 ]
    EigenSolver (sorted) = [ -0.848, -0.272, -0.455 ]

### Testing covariance calculations of point 8207 at xyz= 100,100,100 ### 

Covariance matrix: 
    | 0.001448668 0.001464353 0.000298063 | 
    | 0.001464353 -0.000681671 -0.000815670 | 
    | 0.000298063 -0.000815670 -0.002085172 | 
Eigen values: 
    eigen33 (1st) = [ 0, -, - ]
    eigen33 (2nd) = [ 0, -0.00309, 0.00177 ]
    eigen33 (3rd) = [ 0, -0.00309, 0.00177 ]
    EigenSolver   = [ -0.00268, -0.000835, 0.0022 ]
Point Normal: 
    computePointNormal   = [ -0.626, 0.692, -0.36 ]
    EigenSolver          = [ 0.241, -0.513, -0.824 ]
    EigenSolver (sorted) = [ 0.241, -0.513, -0.824 ]
Normal Error (degrees):
    computePointNormal   = 59.6
    EigenSolver          = 72
    EigenSolver (sorted) = 72

And the file I used to generate it: test_eigenvalue_calculation

A bit of quick testing shows that the normal is still wrong, even when calculated via EigenSolver. Sometimes the EigenSolver result is better... sometimes it's worse... still seems pretty random but when I recompute my plots that will hopefully make things a bit clearer. I also tested all three versions of eigen33 just in case the problem is limited to one version... and it doesn't seem to be, they all seem to be consistent at least.

@SergioRAgostinho
Copy link
Member

I've wrote an email to Matt about this already but forgot to post here. eigen33 is working fine, despite needing some serious refactoring. It just produces rubbish and diverges from the SelfAdjointSolver from Eigen when the data fed to it doesn't respect the required assumptions of being symmetric and positive semi definite matrix. If these are respected the results are similar to Eigen's SelfAdjointSolver in what concerns eigenvalues' values. The eigen vectors are negated but still correct.

The conclusion here is that bad covariance matrices are the root cause of the normals problem.

So to fix and close this issue finally, we need to implement that local neighborhood translation and the problem will be fixed.

@Kaju-Bubanja
Copy link
Contributor

Wow what a journy of an issue, amazing. Thank you all for the hard work.

Do I see this right, that this issue still exists in the newest pcl version and no fix is available, besides doing per cluster demeaning ourselfs and rebuilding from source?

@SergioRAgostinho
Copy link
Member

There's a Pull Request addressing this issue specifically for normal estimation. See #3169. It was reviewed and there are some minor changes that need to be applied to it, but as is, it is functional. This is such a critical issue with a pending PR that it will likely be merged for this next release.

@SergioRAgostinho SergioRAgostinho added kind: bug Type of issue and removed needs: more work Specify why not closed/merged yet labels Aug 12, 2019
@stale
Copy link

stale bot commented May 19, 2020

Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.

@themightyoarfish
Copy link
Contributor

Is this issue still present? It seems kind of important.

@larshg
Copy link
Contributor

larshg commented Feb 8, 2023

I'm pretty sure it has been solved by #4983

@mvieth
Copy link
Member

mvieth commented Feb 10, 2023

The problem with the wrong normals (if the cloud is far from the origin) is indeed fixed by #4983 (we went with per-cluster demeaning and kept float instead of switching to double, for details please see the PR).
Whether ICP or GICP still has a problem (as briefly suspected in one comment) I will try to check when I find time.

Why are you asking @themightyoarfish ? Did you have an issue like this? If yes, what issue exactly and did you update PCL to the latest version 1.13.0?

@themightyoarfish
Copy link
Contributor

No, I do not have a specific problem, I just stumbled upon this issue and I might run a similar use case in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Skills/areas of expertise needed to tackle the issue kind: bug Type of issue kind: todo Type of issue module: features needs: pr merge Specify why not closed/merged yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants