Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime errors with undefined behaviour sanitizer #3176

Closed
Morwenn opened this issue Jun 19, 2019 · 18 comments
Closed

Runtime errors with undefined behaviour sanitizer #3176

Morwenn opened this issue Jun 19, 2019 · 18 comments

Comments

@Morwenn
Copy link
Contributor

Morwenn commented Jun 19, 2019

Your Environment

  • Operating System and version: Ubuntu 18.04.2 LTS
  • Compiler: g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
  • PCL Version: 1.9.1

Context

I tried to run a program using PCL with the undefined behaviour sanitizer and was hit with a runtime error. I don't know whether it's genuine or whether PCL does clever things that might warrant adding a suppression file.

Current Behavior

I inherited some C++ code using PCL and tried to run it with address sanitizers and undefined behaviour sanitizer, but after fixing errors on our side, I eventually got hit with this one which comes from PCL (triggered by the vptr check of ubsan):

[...]/include/pcl-1.9/pcl/visualization/impl/point_cloud_color_handlers.hpp:59:75: runtime error: member call on address 0x55d45ce544b0 which does not point to an object of type 'vtkGenericDataArray'
0x55d45ce544b0: note: object is of type 'vtkUnsignedCharArray'
 00 00 00 00  a0 7d 00 da 95 7f 00 00  00 23 dd 5c 01 00 00 00  00 00 00 00 00 00 00 00  00 00 fe ff
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'vtkUnsignedCharArray'

The line from point_cloud_color_handlers.hpp which triggers the ubsan error is the following one:

reinterpret_cast<vtkUnsignedCharArray*>(&(*scalars))->SetNumberOfTuples (nr_points);

Considering that scalars is declared as vtkSmartPointer<vtkDataArray> & I wouldn't be surprised if the error came from the reinterpret_cast.

Code to Reproduce

I'm unfortunately not sure where it comes from in the code I inherited, but considering that the suspicious line contains a reinterpret_cast, I would expect the PCL code to be responsible for the runtime error.

Possible Solution

Either it can be fixed or PCL can ship a suppression file for the undefined behaviour sanitizer.

@taketwo
Copy link
Member

taketwo commented Jun 19, 2019

Looks like scalars are implicitly required to be a smart pointer to vtkUnsignedCharArray (or be uninitialized pointer). And from what I can see this requirement is fullfilled by calling code within PCL. This is ugly, but in the end reinterpret_cast is kind of legitimate.

Could you please clarify, does the sanitizer hit an error there, or just speculates that there might be one?

@Morwenn
Copy link
Contributor Author

Morwenn commented Jun 19, 2019

The sanitizer just says that it is an error (well, it says that this is undefined behaviour by the standard), but the code otherwise runs fine when I disable the sanitizer. It's probably one of those cases where the standard says it's UB but every compiler considers it to be ok.

It's not blocking for my project, but it means that I need to manually generate suppression files for code that is otherwise expected to work correctly if I want to leverage the power of ubsan to find actual bugs in the code I inherited.

@SergioRAgostinho
Copy link
Member

Looks like scalars are implicitly required to be a smart pointer to vtkUnsignedCharArray (or be uninitialized pointer). And from what I can see this requirement is fullfilled by calling code within PCL. This is ugly, but in the end reinterpret_cast is kind of legitimate.

From the inheritance diagram I'm not fully convinced the cast is required. SetNumberOfTuples is defined in vtkAbstractArray a parent class of vtkDataArray and vtkUnsignedCharArray and it is virtual function, so it will invoke the correct SetNumberOfTuples even without the cast. Will that suppress your error?

@Morwenn
Copy link
Contributor Author

Morwenn commented Jun 19, 2019

I tried removing the cast, and everything seems to work fine (without ubsan). When I compile with ubsan the error isn't triggered anymore, but it chokes on another reinterpret_cast somewhere else in the code (a cast to a vtkPolyDataMapper from a mapper type I'd have to figure out) but which apparently can't be removed.

It seems that using PCL with -fsanitize=vptr is a bit complicated currently. I will try to see whether I can exclude it from the ubsan checks I run.

@taketwo
Copy link
Member

taketwo commented Jun 19, 2019

From the inheritance diagram I'm not fully convinced the cast is required.

Unfortunately I few lines later we cast it again to call SetArray. And that one is not virtual as far as I can tell.

@SergioRAgostinho would you be open to a bit of refactoring here? I see two possibilities. First one is to change the signature to accept vtkSmartPointer<vtkUnsignedCharArray>, because this is an implicit requirement anyway. Second one is to keep the signature, but explicitly say that "we will always allocate new array into your smart pointer". Then inside the function we create vtkSmartPointer<vtkUnsignedCharArray> and work with it as such, assigning scalars to it before returning. I've checked all our call sites and in fact it's always empty smart pointer that is passed into getColor.

@SergioRAgostinho
Copy link
Member

SergioRAgostinho commented Jun 19, 2019

Unfortunately I few lines later we cast it again to call SetArray. And that one is not virtual as far as I can tell.

We can probably resort to vtk's casting mechanisms or the standard shipped dynamic_pointer_cast (or static_pointer_cast if appropriate) to perform that cast. It might not make any difference but we need to try and see. I just have this gut feeling that reinterpret_casts tend to "upset" compilers and verification tools.

Is it possible to "adopt both"? Change the signature of the current method to vtkSmartPointer<vtkUnsignedCharArray> and create a new deprecated method with the old signature and where we create a vtkSmartPointer<vtkUnsignedCharArray> and then invoke the method with the updated signature.

@Morwenn
Copy link
Contributor Author

Morwenn commented Jun 20, 2019

Maybe it's also worth adding jobs with sanitizers to the continuous integration?

@taketwo
Copy link
Member

taketwo commented Jun 20, 2019

We can probably resort to vtk's casting mechanisms

We can experiment with this on the further problems reported by ubsan.

Is it possible to "adopt both"?

Possible. The I would go further and in the new signature return smart pointer instead of taking it by reference.

@taketwo
Copy link
Member

taketwo commented Jun 20, 2019

Maybe it's also worth adding jobs with sanitizers to the continuous integration?

That's an interesting option. But would be very low on my todo list to be honest, just don't have time for this.

@taketwo
Copy link
Member

taketwo commented Jun 23, 2019

@Morwenn we've merged a PR that fixes the problem you reported. Could you have another try with UB?

@Morwenn
Copy link
Contributor Author

Morwenn commented Jun 24, 2019

I'm trying to build PCL with master but it might take longer than expected because I'm using pcl::io::OpenNI2Grabber in my code and for some reason it isn't recognized, as if the macro HAVE_OPENNI2 wasn't defined despite specifying WITH_OPENNI2 when calling CMake. I will report when I'm done with those issues.

@taketwo
Copy link
Member

taketwo commented Jun 24, 2019

despite specifying WITH_OPENNI2 when calling CMake

Please check the output of CMake, it may be that OpenNI2 is not found despite you enabling it.

@Morwenn
Copy link
Contributor Author

Morwenn commented Jun 24, 2019

OpenNI2 is correctly found (I'm using Conan, hence the long links):

-- Found OpenNI2: ~/.conan/data/openni/2.2.0-rev-958951f/local/stable/package/4d626afd70306e4b40f824761d8908742c20b7f4/lib/libOpenNI2.so
-- OpenNI2 found (include: ~/.conan/data/openni/2.2.0-rev-958951f/local/stable/package/4d626afd70306e4b40f824761d8908742c20b7f4/include/openni2, lib: ~/.conan/data/openni/2.2.0-rev-958951f/local/stable/package/4d626afd70306e4b40f824761d8908742c20b7f4/lib/libOpenNI2.so)

It worked correctly with PCL 1.9.1, so I'm surprised that it doesn't with the master branch (I'd point my finger at the CMake overhaul for 1.10, but I can't really be sure). Also I couldn't find the file responsible for defining HAVE_OPENNI2 in the library's files: there is a corresponding CMake variable set by PCL_ADD_GRABBER_DEPENDENCY, but I don't understand how it is supposed to propagate to the .h/.cpp files.

@SergioRAgostinho
Copy link
Member

SergioRAgostinho commented Jun 24, 2019

Also I couldn't find the file responsible for defining HAVE_OPENNI2 in the library's files: there is a corresponding CMake variable set by PCL_ADD_GRABBER_DEPENDENCY, but I don't understand how it is supposed to propagate to the .h/.cpp files.

It gets configured here.

configure_file("${pcl_config_h_in}" "${pcl_config_h}")

#cmakedefine HAVE_OPENNI2 1

This will also help. https://cmake.org/cmake/help/v3.5/command/configure_file.html

Is HAVE_OPENNI2 defined in pcl/common/include/pcl_config.h?

@Morwenn
Copy link
Contributor Author

Morwenn commented Jun 24, 2019

I just checked and it's correctly defined in pcl_config.h. However the inclusion of pcl_config.h in openni2_grabber.h is also guarded by HAVE_OPENNI2, and since openni2_grabber.h is the first PCL file I include it explains why it didn't work for me. Am I supposed to include pcl_config.h myself?

EDIT: found it, this commit is responsible for the issue: 9f7a246#diff-a7a89f9d2f1d1031a32f09984d839d29

@taketwo
Copy link
Member

taketwo commented Jun 24, 2019

That's a bug indeed, thanks for identifying and sorry for the trouble.

@Morwenn
Copy link
Contributor Author

Morwenn commented Jun 24, 2019

I can confirm that the fix for OpenNI2 works, thanks a lot for being so reactive :)

On the other hand, the undefined behaviour sanitizer still produces an error in the same function, albeit not at the same place:

[...]/include/pcl-1.9/pcl/visualization/impl/point_cloud_color_handlers.hpp:55:34: runtime error: member call on address 0x55eda0d64550 which does not point to an object of type 'vtkGenericDataArray'
0x55eda0d64550: note: object is of type 'vtkUnsignedCharArray'
 00 00 00 00  78 ad 70 7d 3d 7f 00 00  f0 86 a0 9f 01 00 00 00  00 00 00 00 00 00 00 00  00 00 fe ff
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'vtkUnsignedCharArray'

I'm even more surprised than before considering that there is not cast involved and everything compiles properly. The affected line is the following one:

scalars->SetNumberOfComponents (3);

Unfortunately I probably won't have time to deal with those errors in my current project, so I will most likely give up on trying to use the undefined behaviour sanitizer for now. Thanks again for your time!

@taketwo
Copy link
Member

taketwo commented Jun 26, 2019

Hm, weird.

OK, I will close this for now. Feel free to re-open if you find time to investigate this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants