-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
PCL RFC 0003: Unified API for Algorithms
Kunal Tyagi edited this page Feb 10, 2020
·
2 revisions
- Title: PCL-RFC-0003: Unified API for Algorithms
- Author: kunaltyagi
- Gitter room: PointCloudLibrary/PCL-RFC-03
Current design in several modules (filters, features, gpu) has several flaws:
- Multiple classes with independent API and independent implementations for OpenMP, GPU, CPU code
- Inextensible model for
- adding support for thread_pools
- multi-GPU support
- using SIMD in OpenMP code
It is possible to use function overloading in C++ to achieve a "Unified API". This is a forward-compatible design with the proposed API for executors
(The API is standardized since C++17, implementation is being standardized). The benefits are:
- Adding SIMD/OpenMP implementation of algorithm automatically allows the other to use it
- Ability to use thread_pools using the future model proposed by C++
- Ability to encapsulate multi-GPU support and provide it along side single-GPU support
- API remains static, allowing users to switch from CPU to OpenMP to GPU with minor changes
A prototype with OpenMP, SIMD and CPU versions can be found here. A simpler prototype is also available. Please try to change the compile flags (for SSE, AVX and OpenMP) to verify that the code adapts to different choices.
The basic details are:
- A "tag" (empty struct) as the first parameter for a function call to enable overloading
- Lack of tag implies allowing PCL to choose the best option
- Tags can be inherited allowing overload resolution to choose the best option without run-time checks
- Missing implementation for a tag raises compile-time errors
- No use of macros beyond hiding the implementation for lack of supported platform.
constexpr
+static_assert
orSFINAE
machinery can be used to eliminate macros
For more details on implementation of executors, please see
- Implementation of executors by champion of the executor proposal
- C++17 API reference
- More freedom for user to choose the execution details
- Allows for unified API to forward the executor to C++ algorithms
- Allows reuse between SIMD, CPU and OpenMP code
- Allows single-GPU code to not pay for overhead of multi-GPU implementation
- Greater ease in testing
- Extensible for user: Bring Your Own Executors (for unsupported tags in PCL)
- None so far except API redesign
- None in the beginning. The idea is to extend current API not break it
- Complete break after deprecation
Minor to Medium
For implementation:
- Add new functionality using executors
- Original functionality will be rerouted to use the new implementation
- OpenMP: use temporary tags for redirection (no executors available now, expect proposals soon)
- CUDA: use temporary tags for redirection, and in future, stream executors (tensorflow has implementation of stream executors for CUDA and OpenCL)
- Deprecate original functionality
For users:
- On deprecation warnings, change the class to original class, and add an executor as first argument
// current
pcl::NormalEstimationOMP<T1, T2> est;
est.setViewPoint(vx, vy, vz);
est.setInputCloud(cloud);
est.computePointNormal(*normal_cloud, indices, nx, ny, nz, curvature);
// proposed
pcl::NormalEstimation<T1, T2> est;
est.setViewPoint(vx, vy, vz);
est.setInputCloud(cloud);
est.computePointNormal(pcl::executor::openmp {}, *normal_cloud, indices, nx, ny, nz, curvature);