Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Surface Reconstuction #2148

Open
yonkahlon opened this issue Dec 13, 2017 · 10 comments
Open

Parallel Surface Reconstuction #2148

yonkahlon opened this issue Dec 13, 2017 · 10 comments
Labels
needs: code review Specify why not closed/merged yet

Comments

@yonkahlon
Copy link

yonkahlon commented Dec 13, 2017

Your Environment

  • Operating System and version: Windows 10
  • Compiler: Microsoft (R) C/C++ Optimizing Compiler Version 19.00.23506
  • PCL Version: 1.8.1

Expected Behavior

Running Poisson surface reconstruction on separate point clouds each in separate threads should works without crashing

Current Behavior

An Access Violation Exceptions is thrown.

Possible Solution

Don't use static memory:

set MEMORY_ALLOCATOR_BLOCK_SIZE to zero in the following files:

#define MEMORY_ALLOCATOR_BLOCK_SIZE 1<<12

&

#define MEMORY_ALLOCATOR_BLOCK_SIZE 1<<12

To test the effect on performance I did a really basic test. Using a single point cloud, default Poisson Recon settings and running on a single thread, I measured how long it takes to perform the reconstruction. The results:

Debug build, static memory, duration: 42 483 037 μs
Debug build, dynamic memory, duration: 40 880 744 μs
Release build, static memory, duration: 4 066 210 μs
Release build, dynamic memory, duration: 4 177 797 μs

At first glace it seems there is minimal performance effect, so what is the rational of using static memory?

Code to Reproduce

#define PCL_NO_PRECOMPILE

#include <thread>
#include <vector>

#include <pcl/point_types.h>
#include <pcl/surface/poisson.h>
#include <pcl/io/ply_io.h>

pcl::PolygonMesh mesh_1;
pcl::PolygonMesh mesh_2;
pcl::PointCloud<pcl::PointNormal>::Ptr cloud_in_1(new pcl::PointCloud<pcl::PointNormal>());
pcl::PointCloud<pcl::PointNormal>::Ptr cloud_in_2(new pcl::PointCloud<pcl::PointNormal>());

void thread1Process()
{
	pcl::Poisson<pcl::PointNormal> surf_con;

	surf_con.setInputCloud(cloud_in_1);
	surf_con.performReconstruction(mesh_1);
}

int main(int argc, char* argv[])
{
	//------------------ LOAD FILES ------------------//
	std::string path_dir = "C:\\Projects\\Snugg\\2.Development\\4.Repo\\temp\\MeasurementFinder\\build\\000Yon\\temp_m_f.ply";

	pcl::PLYReader reader;
	reader.read(path_dir, *cloud_in_1);
	reader.read(path_dir, *cloud_in_2);

	std::thread t1(thread1Process);

	pcl::Poisson<pcl::PointNormal> surf_con;

	surf_con.setInputCloud(cloud_in_2);
	surf_con.performReconstruction(mesh_2);
	
	t1.join();

	pcl::io::savePLYFile("mesh_1.ply", mesh_1);
	pcl::io::savePLYFile("mesh_2.ply", mesh_2);
}

@taketwo
Copy link
Member

taketwo commented Dec 13, 2017

Running Poisson surface reconstruction on separate point clouds each in separate threads should works without crashing

👍 even if it causes a small runtime penalty

@SergioRAgostinho SergioRAgostinho added the needs: code review Specify why not closed/merged yet label Dec 13, 2017
@SergioRAgostinho
Copy link
Member

At first glace it seems there is minimal performance effect, so what is the rational of using static memory?

In general, for things you just want to run once and are used by all instances of your class e.g, a cosine look up table. Only the first instantiation will have the penalty of generating it, all the other following after can use it free of charge. In this particular case, we'll need to debug to understand the rationale behind it

👍 Thanks for submitting.

@yonkahlon
Copy link
Author

yonkahlon commented Dec 13, 2017

An easy solution could be to change the MEMORY_ALLOCATOR_BLOCK_SIZE define and making it an Poisson parameter. I.e. you would have surf_con.setMemoryBlockSize(0) to make it dynamic.

@markloyman
Copy link

I encountered this issue with the original PoissonRecon repository, under a slightly different context - I got the exception by performing the reconstruction twice (in serial in a single thread).

It was first mentioned at: mkazhdan/PoissonRecon#22

I initially suggested (in the forums) to change the MEMORY_ALLOCATOR_BLOCK_SIZE define, as it was the simplest way to confirm that this is indeed related. In my case, there was no need to sacrifice performance even slightly. The following fixes worked for me (with consequent calls to PoissonRecon within the same thread). I'm not sure if this will necessarily work for the multi-threaded issue, but is definitely worth a try.

The original author's suggestion is at: mkazhdan/PoissonRecon#22 (comment)

I believe the problem can be fixed by moving the line:
OctNode< TreeNodeData >::SetAllocator( MEMORY_ALLOCATOR_BLOCK_SIZE );

before the line declaring the tree:
Octree< Real > tree;

This did work for me, but in my code I choose to do a slightly different fix (in use for a couple of months now):

In Octree.inl: check to see if memory was already allocated (MemoryAlreadyAllocated is new variable that needs to be added):

template< class NodeData >
void OctNode< NodeData >::SetAllocator(int blockSize)
{
	if( (blockSize>0) && (false==MemoryAlreadyAllocated))
	{
		UseAlloc=1;
		MemoryAlreadyAllocated = true;
		NodeAllocator.set(blockSize);
	}
	else{UseAlloc=0;}
}

I think it is a safer approach.

@yonkahlon
Copy link
Author

In Octree.inl: check to see if memory was already allocated (MemoryAlreadyAllocated is new variable that needs to be added):

I gave it a go, but it didn't work. The PCL implementation looks slightly different, but I don't think it's causing this issue. The more likely culprit is the fact that SetAllocator is a static function, hence MemoryAlreadyAllocated must be static too. The first thread will have memory allocated, but the second thread won't.

When I run on a single thread it works fine.

@yonkahlon
Copy link
Author

My proposed solution to set MEMORY_ALLOCATOR_BLOCK_SIZE to 0 causes a memory leak.

After some profiling and digging around I found the issue to be Octree::LaplacianMatrixIteration

memory is freed using internalAllocator.reset() , but this does nothing if MEMORY_ALLOCATOR_BLOCK_SIZE is zero.

Could I try implement a fix and create a pull request?

I'm thinking of doing something like this:

  • Make all Allocator classes non-static

  • Create instances of all required Allocator classes in the Poisson constructor

  • Any objects that use an Allocator object, receive it via the constructor

@SergioRAgostinho
Copy link
Member

At the moment my hands are full reviewing another PR and this particular issue requires me to spend some time on it to fully understand the implications of what your proposing, especially since it's code I don't normally look at.

That being said I would suggest to defer any action until we provide some feedback.

@johnmarianhoffman
Copy link

Just wanted to chime in to add that I'm also experiencing this issue in Linux (Ubuntu 16.04, PCL 1.8.1). Super helpful comments. I'm going to dig in and closer look on Monday since my application requires lots of threading and PSR in parallel (critically so). Will report back if I find anything.

@yonkahlon
Copy link
Author

@captnjohnny1618 Did you come right? I can dig around and find my solution. I'm not 100% sure if it will work with the latest version of the repo though.

@johnmarianhoffman
Copy link

@yonkahlon I spent about a half of a day working on trying to track down the memory leak when not using the allocators (i.e. MEMORY_ALLOCATOR_BLOCK_SIZE set to 0) however I wasn't able to find the exact circumstances under which it was occurring. Unfortunately I don't have time to keep debugging any more and must move on to our other projects. For the time being my workaround has been to use the allocators (i.e. the original code with MEMORY_..._BLOCK_SIZE 1<<12 ), but use a mutex so that only one thread can perform a Poisson surface reconstruction at a time. For us, we get some parallelism after a few minutes, however this seems to peak at about 10 cores (out of 32 that we have available).

Here is what I was able to find though:

When NOT using allocators the memory leak seems to be originating from an improperly freed block of memory in sparse_matrix.hpp. Specifically, the blocks that are lost are allocated in the SetRowSize() method of the sparse matrix class. My guess is that there is an edge case where a sparse matrix goes out of scope and is not properly cleaned up, or the Resize() method frees m_ppElements (a pointer to pointers) without properly freeing each of the rows first. As I mentioned above though, I don't have time to debug this any further.

Looking under the hood (pcl/surface/3rdparty/poisson4/sparse_matrix.hpp), that class is a little bit chaotic (lots of C-type raw pointer management mixed with C++ style code, etc.). I think the memory allocator tools were implemented to avoid some of this chaos, but the non-allocator code no longer works. It's a fairly substantial memory leak. It may be better to remove it all together, or re-implement it with thread-safe C++-style memory management (i.e. vectors).

Finally, the last option would be to reimplement the allocator class to not use static allocations. I tried just going through and removing all of the "static" designations, however as expected, this didn't even sort of work! haha. Just thought it was worth a shot on the off chance that it might succeed.

Thanks to all the PCL maintainers for all of their hard work! Sorry I wasn't able to figure out what was causing the issue, but I hope that the above sheds a little bit more light on the problem!

I can't post the entirety of my valgrind output, but I have attached the output of the worst-offending losses.
valgrind_chunk.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs: code review Specify why not closed/merged yet
Projects
None yet
Development

No branches or pull requests

5 participants