Skip to content

PCL RFC 0003: Unified API for Algorithms

Kunal Tyagi edited this page Feb 10, 2020 · 2 revisions

Details

Motivation

Current design in several modules (filters, features, gpu) has several flaws:

  • Multiple classes with independent API and independent implementations for OpenMP, GPU, CPU code
  • Inextensible model for
    • adding support for thread_pools
    • multi-GPU support
    • using SIMD in OpenMP code

It is possible to use function overloading in C++ to achieve a "Unified API". This is a forward-compatible design with the proposed API for executors (The API is standardized since C++17, implementation is being standardized). The benefits are:

  • Adding SIMD/OpenMP implementation of algorithm automatically allows the other to use it
  • Ability to use thread_pools using the future model proposed by C++
  • Ability to encapsulate multi-GPU support and provide it along side single-GPU support
  • API remains static, allowing users to switch from CPU to OpenMP to GPU with minor changes

Detail

A prototype with OpenMP, SIMD and CPU versions can be found here. A simpler prototype is also available. Please try to change the compile flags (for SSE, AVX and OpenMP) to verify that the code adapts to different choices.

The basic details are:

  • A "tag" (empty struct) as the first parameter for a function call to enable overloading
  • Lack of tag implies allowing PCL to choose the best option
  • Tags can be inherited allowing overload resolution to choose the best option without run-time checks
  • Missing implementation for a tag raises compile-time errors
  • No use of macros beyond hiding the implementation for lack of supported platform. constexpr+static_assert or SFINAE machinery can be used to eliminate macros

For more details on implementation of executors, please see

Pros

  • More freedom for user to choose the execution details
  • Allows for unified API to forward the executor to C++ algorithms
  • Allows reuse between SIMD, CPU and OpenMP code
  • Allows single-GPU code to not pay for overhead of multi-GPU implementation
  • Greater ease in testing
  • Extensible for user: Bring Your Own Executors (for unsupported tags in PCL)

Cons

  • None so far except API redesign

ABI/API Breakage

  • None in the beginning. The idea is to extend current API not break it
  • Complete break after deprecation

Effort Required

Minor to Medium

Migration Path

For implementation:

  1. Add new functionality using executors
  2. Original functionality will be rerouted to use the new implementation
    • OpenMP: use temporary tags for redirection (no executors available now, expect proposals soon)
    • CUDA: use temporary tags for redirection, and in future, stream executors (tensorflow has implementation of stream executors for CUDA and OpenCL)
  3. Deprecate original functionality

For users:

  • On deprecation warnings, change the class to original class, and add an executor as first argument
// current
pcl::NormalEstimationOMP<T1, T2> est;
est.setViewPoint(vx, vy, vz);
est.setInputCloud(cloud);
est.computePointNormal(*normal_cloud, indices, nx, ny, nz, curvature);
// proposed
pcl::NormalEstimation<T1, T2> est;
est.setViewPoint(vx, vy, vz);
est.setInputCloud(cloud);
est.computePointNormal(pcl::executor::openmp {}, *normal_cloud, indices, nx, ny, nz, curvature);