Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PCLPointCloud2::operator+=() and update concatenation operation #3320

Merged
merged 3 commits into from
Sep 13, 2019

Conversation

kunaltyagi
Copy link
Member

@kunaltyagi kunaltyagi commented Sep 2, 2019

Splits part of #3316

  • adds += and + operations to pcl::PCLPointCloud2
  • adds static functions concatenate (binary inplace and ternary by-copy) as main logic functions
  • modifies concatenatePointCloud to use the static functions
  • adds an exception in += and + because they can't return error value
  • improves logic in concatenation

Questions:

  • Add nothrow noexcept to functions moving forward?

Logic discrepancies in original version:

  • No update of header.stamp to newer version as performed in PointCloud<T>
  • Case with both clouds as empty doesn't exit early
  • Field name check between strip or not to strip case isn't equivalent. In the former case, name must match, but in the latter case, mismatch names are skipped silently
  • Number of fields are not checked in case some fields would be stripped off

common/include/pcl/PCLPointCloud2.h Outdated Show resolved Hide resolved
common/src/PCLPointCloud2.cpp Show resolved Hide resolved
common/src/io.cpp Outdated Show resolved Hide resolved
@taketwo
Copy link
Member

taketwo commented Sep 2, 2019

Add nothrow to functions moving forward?

You mean noexcept? But we allocate inside, this may throw.

Edit: you forgot to copy over CMake changes.

@kunaltyagi
Copy link
Member Author

you forgot to copy over CMake changes

Fixed that

@taketwo
Copy link
Member

taketwo commented Sep 2, 2019

Commented out c++17 code

In principle, we can add a macro (PCL_FALLTHROUGH?) that is a no-op in pre-c++17 mode.

@kunaltyagi
Copy link
Member Author

we can add a macro (PCL_FALLTHROUGH?)

That sounds good

@taketwo
Copy link
Member

taketwo commented Sep 2, 2019

Something similar this perhaps.

@kunaltyagi
Copy link
Member Author

kunaltyagi commented Sep 2, 2019

I decided to use the feature test macro instead of standard switch. That allows compilers to provide it even with a lower standard (like clang and gcc with some warnings)

EDIT: Did you get time to review the changes to logic I made? Are they sensible?

Copy link
Member

@taketwo taketwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had time to check the logic, will do later today or tomorrow.

common/include/pcl/pcl_macros.h Outdated Show resolved Hide resolved
@taketwo
Copy link
Member

taketwo commented Sep 3, 2019

I had a closer look at the old code and your changes and I like the new approach, the logic is solid.

Stepping back a bit though, I was wondering what is the right thing to do in the following situation. Imagine we have two point clouds with same set of fields that includes padding ("_") fields, e.g. something like:

Cloud 1: X Y Z _ RGB
Cloud 2: X Y Z _ RGB

Both the old and the new version will strip the padding field. But isn't it more logical to not do that? Firstly, it's faster (just one memcpy). Secondly, the function name is "concatenate" after all, not "squeeze".

What's your opinion on this?

Regarding deprecation, I don't have an answer. You wrote:

I added [[deprecated]] but this basically breaks uniform API in generic functions

I assume you are talking about having an overloaded function concatenatePointCloud that takes PCLPointCloud2 clouds as well as "normal" point clouds? We currently don't have this, but it's a fair point that this would be a better generic interface.

On the other hand, do we want so many functions doing the same thing? I don't know. We could use some more opinions, e.g. @SergioRAgostinho, but he seems to currently be in hiatus.

@kunaltyagi
Copy link
Member Author

But isn't it more logical to not do that?

You'd know more about that based on the use-case. Do we have a platform where we can announce stuff like this? Opinion based polls (like future direction and questions like this), even if just for maintainers. A github team might be a good idea.

Regarding your two points:

  • Speed: Definitely, speed is impacted.
  • Function name isn't squeeze but it might help increase speed of other functions downstream. But we can explicitly provide a function to do that.

That's just a bool flag though so it'll not affect code complexity

an overloaded function concatenatePointCloud

Yes. But I guess your point stands.

My opinion: I find free function based API are better for libraries since they allow users to keep on adding modifications without inheritance to third_party classes. std lib actually has free and class functions for a lot of base functions so it's not unheard of to have multiple functions.

Just concatenate sounds better and more generic than concatenatePointCloud since we now have concatenatePolygonMesh functionality too

@taketwo
Copy link
Member

taketwo commented Sep 3, 2019

Do we have a platform where we can announce stuff like this?

In the (very) old times pcl-devs mailing list would have been the right place, but it's long dead. So I'm not sure how to reach to the people who may be interested/affected.

Function name isn't squeeze but it might help increase speed of other functions downstream. But we can explicitly provide a function to do that.

I'd rather prefer that, explicit over implicit. Especially if you think about our original case of adding polygon meshes. You have two similar meshes from two files, you add them. And bam! the underlying point clouds magically changed their internal layout.

My opinion: I find free function based API are better for libraries since they allow users to keep on adding modifications without inheritance to third_party classes. std lib actually has free and class functions for a lot of base functions so it's not unheard of to have multiple functions.

Agreed.

Just concatenate sounds better.

Also agree.

Well, here is my proposal then:

  • Update static PCLPointCloud2::concatenate to not squeeze if field layout matches exactly. Document this behavior.
  • Add free pcl::concatenate function that redirects to the static one.
  • Leave the old free pcl::concatenatePointCloud function as-is, complete with old implementation. Add a deprecation warning and recommendation to use the new function with warning about slightly changed behavior.

What do you think?

@kunaltyagi
Copy link
Member Author

kunaltyagi commented Sep 3, 2019

So I'm not sure how to reach to the people who may be interested/affected

For those active on github, we can have teams based on functions (@PointCloudLibrary/core for maintainers, @PointCloudLibrary/poll) and ping on issues/gitter

not squeeze if field layout matches exactly

So 'x, y, _, z, _' and 'x, y, _, _, z' will result in what? 'x, y, _, z' or 'x, y, z'?

Does "rgb" and "rgba" special case apply?

@taketwo
Copy link
Member

taketwo commented Sep 3, 2019

For those active on github, we can have teams based on functions (@PointCloudLibrary/core for maintainers, @PointCloudLibrary/poll) and ping on issues/gitter

How would you imagine the procedure of "enrolling" people in such groups?

So 'x, y, _, z, _' and 'x, y, _, _, z' will result in what? 'x, y, _, z' or 'x, y, z'?

Neither. Here is my logic:

  1. The sets of non-pad fields are the same in both clouds -> generally concatenation is possible
  2. Layouts are not exactly the same -> memcpy-based "appending" not possible
  3. LHS point cloud is the "main" cloud, so the output layout is 'x, y, _, z, _'
  4. RHS point cloud is squeezed/unsqueezed as needed to match the LHS layout

@kunaltyagi
Copy link
Member Author

kunaltyagi commented Sep 3, 2019

the procedure of "enrolling" people

For maintainers, it is just creating a team from current list of people with some admin access. For larger community, it can be a voluntary "signup" to a channel where you receive polls, updates regarding future (not the current status). A separate repository can be created to discuss. Issues can have expiry dates, people will get notifications (which they can switch on/off per issue and per repo), people can have different levels of async participation (important since gitter needs somewhat people to be in sync communication). Getting people will be slow but it might be an good experiment to try to separate the 2 kinds of issues: current and meta/future

Here is my logic:

Maybe a simpler approach might be better

  1. if (cloud1.fields.size == cloud2.fields.size && all_of(cloud1.fields, cloud2.fields, string equality)), memcpy is possible
  2. Else, proceed with current method of dropping the skip fields
  3. memcpy field by field

Your idea involves instead
2. Else, proceed with moving around skip fields to keep the layout same
3. memcpy field by field

IMO, this gives no advantage from user perspective especially since the fields are marked as skip. We still end up memcpy part by part, and it adds complexity in case the number of skip fields are different but non-skip fields are same.

@kunaltyagi
Copy link
Member Author

Test 16 is failing concatenatePointCloud2

@taketwo
Copy link
Member

taketwo commented Sep 5, 2019

Thanks for suggestions regarding organizational issues, sounds interesting. Unfortunately I am currently overwhelmed with other things, so don't have time to implement these. Maybe you can create an issue and put copy your thoughts there so that they are not lost.

IMO, this gives no advantage from user perspective especially since the fields are marked as skip.

Probably yes. Theoretically, a user may have skip fields on purpose (no ensure alignment for SSE instructions), but I agree it's far-fetched. So 👍 to your proposal.

Does "rgb" and "rgba" special case apply?

Yes, they should always be treated as the same field.

common/include/pcl/PCLPointCloud2.h Outdated Show resolved Hide resolved
common/src/PCLPointCloud2.cpp Outdated Show resolved Hide resolved
@kunaltyagi
Copy link
Member Author

I don't know if the following is a weird usage or misuse of PCL Library. There's a subtle bug that can affect users if their point clouds have the same name of the fields but different datatype. Based on the default datatypes, the field names and data types are sort of standardized, so it shouldn't be an issue. Just raising the point here in case I'm wrong

@taketwo
Copy link
Member

taketwo commented Sep 5, 2019

the field names and data types are sort of standardized

Indeed, sort of. However, it's not impossible to have a PCLPointCloud2 with, say, float64 coordinates. An easy way to obtain one is to read a PLY file where vertices are stored as float64s. In fact, there is a pending PR #3312 that is motivated by this possibility. (By the way, since you are deep in PCLPointCloud2s now, your opinion on that patch will be very welcome!)

Ideally, we should check that not only the names, but also the types match. But honestly, I'd leave this for the next time.

@kunaltyagi
Copy link
Member Author

But honestly, I'd leave this for the next time.

Let's create an issue with "Hey, check it over here" and leave it up for the future? If someone faces issues, search engines will lead them to the right place. This patch is already a bit bigger than required.

@kunaltyagi
Copy link
Member Author

The squeeze function is wrong. Let me remove it from the PR and put it as a snippet in an issue as starting point for later. The last memcpy part is the issue. I can't build gtest on mac so it'll take me time to debug the failing test for the concatenate function

common/src/io.cpp Outdated Show resolved Hide resolved
@kunaltyagi
Copy link
Member Author

@taketwo Found the source of failing test. My algorithm ensures that both clouds agree on what fields to skip.

The previous algorithm would

  • skip fields of cloud2 even if they were not marked as skip (provided cloud1.fields[i].name != "_" and cloud1.fields[i].name != cloud2.fields[j].name)
  • allow cloud2 to choose which fields of cloud2 to concatenate from cloud1 and cloud2 into the output

These features fundamentally make the previous version of concatenate a rather selective_concatenate. I would view the first behavior as a bug. The second can be a feature (ease-of-use) or a bug. In the first case, we can deprecate the usage of concatenatePointCloud instead ask people to choose between concatenate and concatenateFields. concatenateFields can either explicitly ask for fields to choose or implicitly choose cloud2.fields (or both)

How should I proceed?

@taketwo
Copy link
Member

taketwo commented Sep 8, 2019

In my opinion, the most reasonable approach is to deprecate the old function, update its documentation to highlight the problem in its behavior, and recommend to use the new function. The implementation of the old function should be left as-is. Actually, I don't expect that the current behavior has legitimate use-cases and that someone is relying on it. But just in case there are such people, we will keep the old function in deprecated state for considerable time so that there is enough transition time for users to either reconsider their use-case or file a feature request to add functions that allow to achieve the old behavior.

Copy link
Member

@taketwo taketwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for all the effort!

@Morwenn
Copy link
Contributor

Morwenn commented Sep 10, 2019

Concerning the [[fallthrough]] thing, I remember having had to implement the same macro with more workarounds to avoid a bunch of warnings depending on compilers and flags in another project: for example Clang warned when using [[fallthrough]] in C++14 despite it being available because it's a C++17 feature. I ended up with something like this to avoid warnings:

#if __has_cpp_attribute(fallthrough) && !(defined(__clang__) && __cplusplus < 201703L)
#   define PCL_FALLTHROUGH [[fallthrough]];
#elif defined(__clang__)
#   define PCL_FALLTHROUGH [[clang::fallthrough]];
#elif defined(__GNUC__)
#   if __GNUC__ >= 7
#       define PCL_FALLTHROUGH [[gnu::fallthrough]];
#   else
#       define PCL_FALLTHROUGH ;
#   endif
#else
#   define PCL_FALLTHROUGH ;
#endif

@kunaltyagi
Copy link
Member Author

@taketwo Do we have builds with different compilers and versions to confirm this? It might be possible that compiler devs might have corrected this behavior

@taketwo
Copy link
Member

taketwo commented Sep 10, 2019

@Morwenn thanks for sharing.

@kunaltyagi no, we have a very limited coverage of compiler versions, just one per operating system. So we'll have to trust @Morwenn until proven otherwise.

Better PCL_FALLTHROUGH
Added += and + operators for PCLPointCloud2
Deprecated concatenatePointCloud function
Added free concatenate operation for PCLPointCloud
Added free and static member function concatenate for PointCloud
@kunaltyagi
Copy link
Member Author

PTAL

@taketwo taketwo changed the title Adding PCLPointCloud2 operator += and update in concatenate operation Adding PCLPointCloud2 operator += and update in concatenate ope… Sep 13, 2019
@taketwo taketwo merged commit 01c480a into PointCloudLibrary:master Sep 13, 2019
@kunaltyagi kunaltyagi deleted the pclcloud2_op_add branch September 13, 2019 14:07
kunaltyagi added a commit to kunaltyagi/pcl that referenced this pull request Sep 16, 2019
@taketwo taketwo added changelog: deprecation Meta-information for changelog generation module: common labels Sep 17, 2019
kunaltyagi added a commit to kunaltyagi/pcl that referenced this pull request Sep 19, 2019
kunaltyagi added a commit to kunaltyagi/pcl that referenced this pull request Sep 20, 2019
@jasjuang
Copy link
Contributor

@kunaltyagi @taketwo It seems to me that the += operator for pcl::PointCloud now becomes a lot slower than it used to be. Is this expected?

@kunaltyagi
Copy link
Member Author

That's very unexpected. The only difference is a route to concatenate call (which shouldn't cause delays) and a switch to reserve and insert instead of resize and copy per index which are semantically same operations.

Are both the versions compiled with same optimization flags?

@jasjuang
Copy link
Contributor

Yes, to elaborate a bit more with a minimal example:

pcl::PointCloud<pcl::PointXYZRGB> cld;

for (int i = 0; i < 1000000; i++) {
        pcl::PointCloud<pcl::PointXYZRGB> cld_single;

        cld_single.width    = 1;	
        cld_single.height   = 1;	
        cld_single.is_dense = false;	
        cld_single.points.resize(cld_single.width * cld_single.height);

        cld_single.points[0].x = i;
        cld_single.points[0].y = i;
        cld_single.points[0].z = i;
        cld_single.points[0].b = 128;
        cld_single.points[0].g = 128;
        cld_single.points[0].r = 128;
        
        cld+=cld_single;
}

used to have a similar run time as

pcl::PointCloud<pcl::PointXYZRGB> cld;

for (int i = 0; i < 1000000; i++) {
        pcl::PointXYZRGB pt;

        pt.x = i;
        pt.y = i;
        pt.z = i;
        pt.b = 128;
        pt.g = 128;
        pt.r = 128;
        
        cld.emplace_back(pt);
}

but now it becomes a lot slower.

@kunaltyagi
Copy link
Member Author

kunaltyagi commented Sep 24, 2019

Thanks. I'm compiling PCL with "-O3" and will report my findings

BTW do you know how to maintain a per-commit speed benchmark? I've seen some projects do that, but I don't remember which ones and how

UPDATE:
You're correct. I've found the root cause but find it baffling. The issue is the use of reserve + insert for 1 element vs resize + copy (or simple for loop). I'll update and send a patch soon. I'm still trying to understand why. The core reason is abnormally high branch misses as well as cache misses for the insert vs copy reserve vs resize

Benchmark Time (ns) Iteration
Concat 83400 32600
old 29 22791000
new (patch) 28 23538000
emplace 22(.5) 2636000

For small indices, the raw for loop performs better. But for larger indices, std::copy is faster at more than 100 points, but raw for loop at less than 100 points. Difference reduces linearly from 3% (1ns) at 1 point to 0% at 100 points.

EDIT:
Link to repo

EDIT2:
The issue seems to be GCC's libstdc++ which deallocates, moves and reallcoates in reserve. But libc++ of clang doesn't do that

@taketwo
Copy link
Member

taketwo commented Sep 24, 2019

Here are my numbers for 10000 points on Ubuntu 18.04, GCC 7.4:

Benchmark Time (ns)
concat 19,349,013
resize raw 00,222,214
resize copy 00,226,296
reserve raw 17,433,347
reserve insert 20,225,551
insert 00,183,008

The last line is for the new version, which does insert without reserve (same as proposed in your patch).

I observe significant fluctuations in the measured time. However, insert is consistently the fastest.

@kunaltyagi
Copy link
Member Author

I observe significant fluctuations in the measured time

Did you set the CPU frequency governor to performance? Otherwise the default scaling doesn't do well for benchmarking. You could also pin the process to one CPU (even if it's multi-threaded) using taskset

A consistent (non-existent preferred) non-benchmark load also helps to reduce noise.

same as proposed in your patch

Yeah, I forgot to update the pcl-bench repo

@taketwo
Copy link
Member

taketwo commented Sep 24, 2019

sudo cpupower frequency-set --governor performance

This seems to help a lot against fluctuations.

@SergioRAgostinho
Copy link
Member

I learned a couple of new tricks with this thread. 👍

@taketwo taketwo changed the title Adding PCLPointCloud2 operator += and update in concatenate ope… Add PCLPointCloud2::operator+= and update in concatenation operation Jan 14, 2020
@taketwo taketwo changed the title Add PCLPointCloud2::operator+= and update in concatenation operation Add PCLPointCloud2::operator+=() and update concatenation operation Jan 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: deprecation Meta-information for changelog generation module: common
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants