Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Apr 26, 2018

  • A ThreadPool class with future-returning task submission, and the ability to change number of worker threads on-the-fly
  • Tests for the ThreadPool class, including stress tests
  • A singleton thread pool for cpu-bound tasks, configured based on hardware capacity
  • A public API to change global thread pool capacity
  • Migrated the Arrow codebase to using the global thread pool (except APIs taking a nthreads, see below)

Remaining open question:

  • what do we do with APIs that take a user-facing nthreads argument? (the Pandas conversion routines, which are able to convert/copy different columns in parallel)

@pitrou pitrou force-pushed the ARROW-2479-threadpool branch 4 times, most recently from 65ca6ce to 614644f Compare May 1, 2018 14:22
@pitrou
Copy link
Member Author

pitrou commented May 1, 2018

@pcmoritz

@pitrou pitrou changed the title [WIP] ARROW-2479: [C++] Add ThreadPool class ARROW-2479: [C++] Add ThreadPool class May 1, 2018
@pitrou
Copy link
Member Author

pitrou commented May 1, 2018

There appears to be a failure in the symbol visibility check step:
https://travis-ci.org/apache/arrow/jobs/373487470#L792

I can't reproduce here (Ubuntu 16.04). It might be related to use of the static singleton pattern (the extraneous symbol is __once_proxy, which seems to implement std::call_once. @xhochy

Copy link
Contributor

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Plasma modifications look good to me!

@xhochy
Copy link
Member

xhochy commented May 1, 2018

It could be that the symbol depends on the version of the STL implementation. We can add that symbol to cpp/src/arrow/symbols.map to hide it always. If it's not there, it is silently ignored.

@pitrou pitrou force-pushed the ARROW-2479-threadpool branch from 614644f to 3a54747 Compare May 2, 2018 13:37
@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

There's a crash in the manylinux job, unfortunately it seems the core dump isn't found by our Travis-CI script.

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

I'm not sure I understand how the Docker container works. Apparently there are two copies of Arrow? One in /arrow/ and one in /io/arrow/. If I modify something under /io/arrow/cpp, nothing changes it seems.

@xhochy
Copy link
Member

xhochy commented May 2, 2018

The copy in /io/arrow is not relevant, only the one that is in /arrow is the one that is used. Once you change something in the sources, you must run the docker build command again. (This is sadly not incremental)

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

Also, it seems libplasma.so is built only during the Python step?

@xhochy
Copy link
Member

xhochy commented May 2, 2018

That is ok, the first build of Arrow is solely for building parquet-cpp once and not per Python version.

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

So, I have the gdb backtrace:

(gdb) bt
#0  0x00007fca37db7377 in void std::__once_call_impl<std::_Bind_simple<std::_Mem_fn<void (std::__future_base::_State_base::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()>&, bool&)> ()(std::__future_base::_State_base*, std::reference_wrapper<std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()> >, std::reference_wrapper<bool>)> >() () from /io/arrow/python/pyarrow/libarrow.so.0
#1  0x00007fca410dc1d3 in pthread_once () from /lib64/libpthread.so.0
#2  0x00007fca372e6d94 in std::__future_base::_Task_state<std::_Bind<void (*()(unsigned char*, unsigned long, unsigned long*))(unsigned char const*, long, unsigned long*)>, std::allocator<int>, void ()()>::_M_run() () from /io/arrow/python/pyarrow/libplasma.so.0
#3  0x00007fca37dc39fe in arrow::internal::ThreadPool::WorkerLoop(std::_List_iterator<std::thread>) () from /io/arrow/python/pyarrow/libarrow.so.0
#4  0x00007fca37f5b560 in execute_native_thread_routine () from /io/arrow/python/pyarrow/libarrow.so.0
#5  0x00007fca410d683d in start_thread () from /lib64/libpthread.so.0
#6  0x00007fca407c1fdd in clone () from /lib64/libc.so.6

(there are other threads)

Unfortunately no line numbers...

@xhochy
Copy link
Member

xhochy commented May 2, 2018

As it's in __once_call_impl could you check if it still happens if you remove __once_proxy from the symbols.map file again? (and comment out the check)

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

It looks worse in that case:

#0  0x0000000000000000 in ?? ()
#1  0x00007fe1712781d3 in pthread_once () from /lib64/libpthread.so.0
#2  0x00007fe1674825c4 in std::__future_base::_Task_state<std::_Bind<void (*()(unsigned char*, unsigned long, unsigned long*))(unsigned char const*, long, unsigned long*)>, std::allocator<int>, void ()()>::_M_run() () from /io/arrow/python/pyarrow/libplasma.so.0
#3  0x00007fe167f5fa3e in arrow::internal::ThreadPool::WorkerLoop(std::_List_iterator<std::thread>) () from /io/arrow/python/pyarrow/libarrow.so.0
#4  0x00007fe1680f75a0 in execute_native_thread_routine () from /io/arrow/python/pyarrow/libarrow.so.0
#5  0x00007fe17127283d in start_thread () from /lib64/libpthread.so.0
#6  0x00007fe17095dfdd in clone () from /lib64/libc.so.6

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

By the way, I'm curious about this comment in symbols.map:

      # devtoolset or -static-libstdc++ - the Red Hat devtoolset statically
      # links c++11 symbols into binaries so that the result may be executed on
      # a system with an older libstdc++ which doesn't include the necessary
      # c++11 symbols.

It looks like libarrow.so still links to libstdc++ dynamically?

@xhochy
Copy link
Member

xhochy commented May 2, 2018

Yes, we still link dynamically and we probably never run on systems that are older than CentOS 5.

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

Does it mean the comment above about devtoolset and static linking is obsolete?
Also, how do ensure the libstdc++ ABI is ok, if we compile against the devtoolset's header file but dynamically link with the system-global libstdc++?

@xhochy
Copy link
Member

xhochy commented May 2, 2018

Does it mean the comment above about devtoolset and static linking is obsolete?

yes

Also, how do ensure the libstdc++ ABI is ok, if we compile against the devtoolset's header file but dynamically link with the system-global libstdc++?

We hope that it works. Sadly this is not always the case. Thus my yet failed attempt to fix this: #1464

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

Ok, for the record, the devtoolset doesn't automatically link statically, but it has a linker script to fetch missing symbols from the statically-compiled version:

# cat /opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-CentOS-linux/4.8.2/libstdc++.so
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
INPUT ( /usr/lib64/libstdc++.so.6 -lstdc++_nonshared )

And the thread-local "once_call" symbols (used by the std::call_once implementation) are only in the static library:

# nm -C --defined-only /opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-CentOS-linux/4.8.2/libstdc++_nonshared.a | grep once_call
0000000000000000 W void std::__once_call_impl<std::_Bind_simple<std::_Mem_fn<void (std::thread::*)()> (std::reference_wrapper<std::thread>)> >()
0000000000000000 B std::__once_call
0000000000000000 B std::__once_callable

Here's the std::call_once implementation from the devtoolset:
https://gist.github.com/pitrou/b8df9ffbcfcdf413b4b10b66f477b1a8

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

If I look back at gdb traceback in #1953 (comment), my impression is the following:

  • in frame 3, libplasma.so calls std::call_once: this sets up a thread-local __once_callable with the actual callable, then calls pthread_once with a helper __once_call_impl
  • in frame 2, pthread_once which calls an internal helper __once_proxy inside libstdc++ which probably redirects to __once_call_impl
  • in frame 1, __once_call_impl calls the thread-local __once_callable

Since frame 1 executes inside libarrow.so, and the *once_call* symbols have been statically linked (as they don't exist on the system libstdc++.so), it sounds plausible that frame 1 doesn't get the right function pointer and crashes.

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

Some debug prints inside the call_once implementation seem to confirm that hypothesis:

-- call_once: callable = 0x7f13f347f050
-- once_call_impl: callable = (nil)

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

(by the way, this is because libstdc++ implements setting a future's result using std::call_once)

@pitrou
Copy link
Member Author

pitrou commented May 2, 2018

By letting the std::__once* symbols be exposed by libarrow.so, it seems we allow libplasma.so to use the same instance of the symbols, which fixes the crash.

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@pitrou
Copy link
Member Author

pitrou commented May 3, 2018

The remaining question here is about the APIs that take an explicit nthreads argument. What should we do with them? @wesm

@xhochy
Copy link
Member

xhochy commented May 8, 2018

Regarding the nthreads= argument, I would change it to:

  • Make the ThreadPool accessible from Python, so that we can set the number of Threads there.
  • Deprecate nthreads and add in some places a new argument use_threads. For example in the Parquet reader, one can only use threads if the underyling file handle is thread-safe. In these cases, the user should explicitly enable threading.

@xhochy
Copy link
Member

xhochy commented May 8, 2018

It seems that by default we will start with a constant of 4 threads in the threadpool (looking into MakeThreadPool). It would be nice to use some other indication on how much threads to start. Often the environment variable OMP_NUM_THREADS is used for this.

@pitrou
Copy link
Member Author

pitrou commented May 8, 2018

It seems that by default we will start with a constant of 4 threads in the threadpool

It uses the number returned by std::thread::hardware_concurrency, which should (IMHO) never fail on "normal" setups. 4 threads is only a conservative fallback in case the former fails.

It would be nice to use some other indication on how much threads to start. Often the environment variable OMP_NUM_THREADS is used for this.

Yeah, that's something we'll have to do. There's some interesting discussion here about which heuristic to use exactly:
https://bugs.python.org/issue32986

@pitrou pitrou force-pushed the ARROW-2479-threadpool branch from 5d6f7bd to cea94b4 Compare May 10, 2018 15:03
@pitrou pitrou closed this in 2093f6e May 10, 2018
@pitrou pitrou deleted the ARROW-2479-threadpool branch May 10, 2018 15:48
@wesm
Copy link
Member

wesm commented May 23, 2018

Sorry I missed the ping here; I was traveling and had my GitHub notifications muted

@wesm
Copy link
Member

wesm commented May 23, 2018

I will do a round of code review; if you want to make any changes then this can go in a new patch

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @pitrou! I've been wanting a global thread pool for a long time but hadn't gotten around to building it myself. I am curious if the packaged task issue could be resolved by passing std::function<void()>&& task to SpawnReal instead (since task is being moved in that function anyway), hence sidestepping the requirement of being CopyConstructible

uintptr_t block_size, int num_threads) {
std::vector<std::thread> threadpool(num_threads);
// XXX This function is really using `num_threads + 1` threads.
auto pool = CPUThreadPool();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented in the latest PR about the name for this function -- I would either call it GetCpuThreadPool or cpu_thread_pool

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do. Should I open a separate PR for the fixes here or do them as part of the latest PR?


ThreadPool::~ThreadPool() { ARROW_UNUSED(Shutdown(false /* wait */)); }

Status ThreadPool::SetCapacity(size_t threads) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per comment on latest PR I would recommend using int threads instead of size_t threads

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine :-)

cv_.notify_all();
cv_shutdown_.wait(lock, [this] { return workers_.empty(); });
if (!quick_shutdown_) {
DCHECK_EQ(pending_tasks_.size(), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this failure mode is esoteric?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a bug, as it's checking an internal invariant.

: task_(std::make_shared<PackagedTask>(std::forward<PackagedTask>(task))) {}

void operator()(Args&&... args) { return (*task_)(std::forward<Args>(args)...); }
std::shared_ptr<PackagedTask> task_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be unique_ptr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, since this must be copyable.


} // namespace detail

class ThreadPool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this class appear in the Doxygen docs? If so, should use three forward slashes /// and also add some tags like \brief to the function descriptions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. Right now I would consider the ThreadPool class an internal API, and only the top-level function to set its capacity a public API.


Status st = SpawnReal(detail::packaged_task_wrapper<Result>(std::move(task)));
if (!st.ok()) {
throw std::runtime_error(st.ToString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could this exception manifest in real code, or would it be an esoteric failure (a la DCHECK)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It manifests if you call Submit after Shutdown. It's a programmer error, though, not a random runtime condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. If you get a chance, adding a code comment there to this effect would be great

@wesm
Copy link
Member

wesm commented May 23, 2018

Feel free to incorporate any changes into the latest PR

@robertnishihara
Copy link
Contributor

This PR seems to have introduced https://issues.apache.org/jira/browse/ARROW-2657.

@wesm
Copy link
Member

wesm commented May 31, 2018

You have to import TensorFlow first. TensorFlow is not respecting the manylinux1 standard and is using newer compilers. We've seen bugs related to libstdc++ reported elsewhere

@wesm
Copy link
Member

wesm commented May 31, 2018

It does look like there's an issue related to that pthread symbol though that we should see if can be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants