Skip to content

Parallelize portable ops if threadpool is available, with fallback to parallel_for-as-for-loop #8932

Closed
@swolchok

Description

@swolchok

🚀 The feature, motivation and pitch

It seems suboptimal to me that we have to create separate optimized ops just to get basic stuff like parallelization (and vectorization, but let's start with parallelization). Here's what I'd like to do: (The timeline here is "ASAP", but I'm opening an issue because this got too long for chat and so that I can point to this issue on the PRs.)

  1. Set up a proper CMake build for extension/parallel; right now it's free-riding on buck and getting automatically duplicated into 3 different targets per the generated executorch_srcs.cmake. (done; Add proper CMake build for extension_parallel #8938)
  2. Make extension_threadpool itself export the -DET_USE_THREADPOOL macro we already use and define somewhat ad-hoc. (done; Properly export ET_USE_THREADPOOL from the threadpool extension #8947)
  3. move extension/parallel/thread_parallel.h to core. (@larryliu0820 suggests runtime/kernel/thread_parallel.h) (Yes I will leave a stub header behind for backward compatibility.) Move thread_parallel.cpp to threadpool, since there will be no reason not to provide it when threads are available. Provide a default implementation of parallel_for if threadpool is not built (gated behind ET_USE_THREADPOOL) that is just an inlinable for loop. (Split & remove extension_parallel #8983)
  4. use parallel_for in at least one portable op, either directly or via the workhorse "util" functions. (Add basic parallel_for support to reduce_util #8986)
  5. Verify that, because the optimized library is built with threadpool, it gets parallelization. Adjust build configuration for optimized ops lib if necessary. (Build optimized_portable_kernels if threadpool is enabled #8987)
  6. Roll out parallel_for across portable ops and workhorse "util" functions.

Thoughts? Blockers?

Alternatives

status quo -- slow portable ops

Additional context

No response

RFC (Optional)

No response

cc @larryliu0820 @manuelcandales

Metadata

Metadata

Assignees

Labels

actionableItems in the backlog waiting for an appropriate impl/fixmodule: kernelsIssues related to kernel libraries and utilities, and code under kernels/triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions