Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Remove serial dependency on jemalloc #39559

Closed
wants to merge 1 commit into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Jan 11, 2024

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@pitrou
Copy link
Member Author

pitrou commented Jan 11, 2024

@github-actions crossbow submit -g cpp -g wheel

@pitrou
Copy link
Member Author

pitrou commented Jan 11, 2024

@kou @assignUser This works for me locally to remove the blocking build step on jemalloc.
Without this PR, jemalloc first waits configuring and building, then Arrow C++ sources start building.
With this PR, Arrow C++ sources start building immediately.

This comment was marked as outdated.

@pitrou
Copy link
Member Author

pitrou commented Jan 11, 2024

Ok, it doesn't work for other configurations. I don't see any simple way out of this, and our CMake setup is really messy here.

@pitrou
Copy link
Member Author

pitrou commented Jan 11, 2024

I think it comes down to this: setting either ARROW_STATIC_LINK_LIBS, ARROW_SHARED_LINK_LIBS or ARROW_SHARED_PRIVATE_LINK_LIBS introduces a dependency for all Arrow C++ source files on these libraries.

A global solution might be to define separate OBJECT libraries for every private dependency (such jemalloc but also AWS, etc.).

@kou
Copy link
Member

kou commented Jan 11, 2024

I tried the following:

diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt
index d26e06a146..f15eb64f85 100644
--- a/cpp/CMakeLists.txt
+++ b/cpp/CMakeLists.txt
@@ -942,15 +942,8 @@ if(ARROW_BUILD_BENCHMARKS)
   endif()
 endif()
 
-if(ARROW_JEMALLOC)
-  list(APPEND ARROW_SHARED_LINK_LIBS jemalloc::jemalloc)
-  list(APPEND ARROW_STATIC_LINK_LIBS jemalloc::jemalloc)
-endif()
-
 if(ARROW_MIMALLOC)
   add_definitions(-DARROW_MIMALLOC)
-  list(APPEND ARROW_SHARED_LINK_LIBS mimalloc::mimalloc)
-  list(APPEND ARROW_STATIC_LINK_LIBS mimalloc::mimalloc)
 endif()
 
 # ----------------------------------------------------------------------
diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index a2627c190f..4cbcdaa505 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -2143,7 +2143,6 @@ if(ARROW_MIMALLOC)
     target_link_libraries(mimalloc::mimalloc INTERFACE "bcrypt.lib" "psapi.lib")
   endif()
   add_dependencies(mimalloc::mimalloc mimalloc_ep)
-  add_dependencies(toolchain mimalloc_ep)
 
   list(APPEND ARROW_BUNDLED_STATIC_LIBS mimalloc::mimalloc)
 
diff --git a/cpp/src/arrow/CMakeLists.txt b/cpp/src/arrow/CMakeLists.txt
index c1fafeebc0..2263dc1a31 100644
--- a/cpp/src/arrow/CMakeLists.txt
+++ b/cpp/src/arrow/CMakeLists.txt
@@ -145,27 +145,6 @@ macro(append_runtime_avx512_src SRC)
 endmacro()
 
 set(ARROW_SRCS
-    array/array_base.cc
-    array/array_binary.cc
-    array/array_decimal.cc
-    array/array_dict.cc
-    array/array_nested.cc
-    array/array_primitive.cc
-    array/array_run_end.cc
-    array/builder_adaptive.cc
-    array/builder_base.cc
-    array/builder_binary.cc
-    array/builder_decimal.cc
-    array/builder_dict.cc
-    array/builder_run_end.cc
-    array/builder_nested.cc
-    array/builder_primitive.cc
-    array/builder_union.cc
-    array/concatenate.cc
-    array/data.cc
-    array/diff.cc
-    array/util.cc
-    array/validate.cc
     builder.cc
     buffer.cc
     chunked_array.cc
@@ -175,7 +154,6 @@ set(ARROW_SRCS
     datum.cc
     device.cc
     extension_type.cc
-    memory_pool.cc
     pretty_print.cc
     record_batch.cc
     result.cc
@@ -261,11 +239,42 @@ set(ARROW_SRCS
     vendored/double-conversion/string-to-double.cc
     vendored/double-conversion/strtod.cc)
 
+add_library(arrow_array OBJECT
+    array/array_base.cc
+    array/array_binary.cc
+    array/array_decimal.cc
+    array/array_dict.cc
+    array/array_nested.cc
+    array/array_primitive.cc
+    array/array_run_end.cc
+    array/builder_adaptive.cc
+    array/builder_base.cc
+    array/builder_binary.cc
+    array/builder_decimal.cc
+    array/builder_dict.cc
+    array/builder_run_end.cc
+    array/builder_nested.cc
+    array/builder_primitive.cc
+    array/builder_union.cc
+    array/concatenate.cc
+    array/data.cc
+    array/diff.cc
+    array/util.cc
+    array/validate.cc)
+
+set(ARROW_MEMORY_POOL_SRCS memory_pool.cc)
 if(ARROW_JEMALLOC)
-  list(APPEND ARROW_SRCS memory_pool_jemalloc.cc)
+  list(APPEND ARROW_MEMORY_POOL_SRCS memory_pool_jemalloc.cc)
   set_source_files_properties(memory_pool_jemalloc.cc
                               PROPERTIES SKIP_UNITY_BUILD_INCLUSION ON)
 endif()
+add_library(arrow_memory_pool OBJECT ${ARROW_MEMORY_POOL_SRCS})
+if(ARROW_JEMALLOC)
+  target_link_libraries(arrow_memory_pool jemalloc::jemalloc)
+endif()
+if(ARROW_MIMALLOC)
+  target_link_libraries(arrow_memory_pool mimalloc::mimalloc)
+endif()
 
 append_runtime_avx2_src(util/bpacking_avx2.cc)
 append_runtime_avx512_src(util/bpacking_avx512.cc)
@@ -350,32 +359,6 @@ set(ARROW_TESTING_SRCS
     testing/generator.cc
     testing/util.cc)
 
-# Add dependencies for third-party allocators.
-# If possible we only want memory_pool.cc to wait for allocators to finish building,
-# but that only works with Ninja
-# (see https://gitlab.kitware.com/cmake/cmake/issues/19677)
-
-set(_allocator_dependencies "") # Empty list
-if(jemalloc_VENDORED)
-  list(APPEND _allocator_dependencies jemalloc_ep)
-endif()
-if(mimalloc_VENDORED)
-  list(APPEND _allocator_dependencies mimalloc_ep)
-endif()
-
-if(_allocator_dependencies)
-  if("${CMAKE_GENERATOR}" STREQUAL "Ninja")
-    set_source_files_properties(memory_pool.cc PROPERTIES OBJECT_DEPENDS
-                                                          "${_allocator_dependencies}")
-  else()
-    add_dependencies(arrow_dependencies ${_allocator_dependencies})
-  endif()
-  set_source_files_properties(memory_pool.cc PROPERTIES SKIP_PRECOMPILE_HEADERS ON
-                                                        SKIP_UNITY_BUILD_INCLUSION ON)
-endif()
-
-unset(_allocator_dependencies)
-
 if(CMAKE_CXX_COMPILER_ID STREQUAL "AppleClang" OR CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
   set_property(SOURCE util/io_util.cc
                APPEND_STRING
@@ -666,8 +649,12 @@ add_arrow_lib(arrow
               ${ARROW_SHARED_LINK_LIBS}
               SHARED_PRIVATE_LINK_LIBS
               ${ARROW_SHARED_PRIVATE_LINK_LIBS}
+              arrow_array
+              arrow_memory_pool
               STATIC_LINK_LIBS
               ${ARROW_STATIC_LINK_LIBS}
+              arrow_array
+              arrow_memory_pool
               STATIC_INSTALL_INTERFACE_LIBS
               ${ARROW_STATIC_INSTALL_INTERFACE_LIBS}
               SHARED_INSTALL_INTERFACE_LIBS

We can build cpp/src/arrow/array/*.cc before jemalloc with it. If we split more files to separated OBJECT libraries, we will be able to build them before jemalloc.

@pitrou
Copy link
Member Author

pitrou commented Jan 11, 2024

It's funny because I tried something similar (a arrow_memory_pool object library) but I bailed out after I failed making it work with the conda-cpp Docker build. Apparently reusing an objlib into a static library is not easy to achieve.

@zanmato1984
Copy link
Collaborator

It's funny because I tried something similar (a arrow_memory_pool object library) but I bailed out after I failed making it work with the conda-cpp Docker build. Apparently reusing an objlib into a static library is not easy to achieve.

Just out of curiosity, what went wrong with the objlib? I want to help on this.

@pitrou
Copy link
Member Author

pitrou commented Jan 26, 2024 via email

@zanmato1984
Copy link
Collaborator

The linker couldn't find some jemalloc symbols. I was not able to understand the reason and I don't remember the details, sorry. Le 26 janvier 2024 11:37:43 GMT+01:00, Rossi Sun @.***> a écrit :

It's funny because I tried something similar (a arrow_memory_pool object library) but I bailed out after I failed making it work with the conda-cpp Docker build. Apparently reusing an objlib into a static library is not easy to achieve. Just out of curiosity, what went wrong with the objlib? I want to help on this. -- Reply to this email directly or view it on GitHub: #39559 (comment) You are receiving this because you authored the thread. Message ID: @.***>

No problem.

@kou
Copy link
Member

kou commented Jan 29, 2024

#39824

@pitrou
Copy link
Member Author

pitrou commented Feb 7, 2024

Closing this in favor of Kou's PR #39824.

@pitrou pitrou closed this Feb 7, 2024
@pitrou pitrou deleted the jemalloc_serial_dep branch February 7, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants