Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39517: [C++] Disable parallelism for jemalloc external project #39522

Merged
merged 3 commits into from
Jan 9, 2024

Conversation

assignUser
Copy link
Member

@assignUser assignUser commented Jan 9, 2024

Rationale for this change

With CMake > 3.28 the generated Makefile fails on the jemalloc_ep due to 'bad file descriptor'.

What changes are included in this PR?

For a sequential build for jemalloc by setting -j1.

Are these changes tested?

CI

Are there any user-facing changes?

No.

@assignUser assignUser added this to the 15.0.0 milestone Jan 9, 2024
Copy link

github-actions bot commented Jan 9, 2024

⚠️ GitHub issue #39517 has been automatically assigned in GitHub to PR creator.

@assignUser
Copy link
Member Author

@github-actions crossbow submit -g r

Copy link

github-actions bot commented Jan 9, 2024

Revision: 5f6051d

Submitted crossbow builds: ursacomputing/crossbow @ actions-e361533807

Task Status
r-binary-packages GitHub Actions
test-fedora-r-clang-sanitizer Azure
test-r-arrow-backwards-compatibility GitHub Actions
test-r-depsource-bundled Azure
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-library-r-base-latest Azure
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-offline-maximal GitHub Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rstudio-r-base-4.1-opensuse153 Azure
test-r-rstudio-r-base-4.2-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions
test-ubuntu-r-sanitizer Azure

@assignUser assignUser requested a review from kou January 9, 2024 01:32
@assignUser
Copy link
Member Author

The r-binary-packages, devdocs, backwards comp and other builds where previously failing due to this issue and now pass.

set(JEMALLOC_BUILD_COMMAND ${MAKE} ${MAKE_BUILD_ARGS})
# Paralleism for Make fails with CMake > 3.28 see #39517
if(${CMAKE_GENERATOR} MATCHES "Makefiles")
list(APPEND JEMALLOC_BUILD_COMMAND "-j1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to use -j1 only for jemalloc?
It seems that this problem may be happen with other software. So how about setting MAKE_BUILD_ARGS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem has not occurred with any other dependency neither on CI nor in my own testing (or after the fix). For the smaller dependencies it might not make a difference but with the number of larger dependencies we have forcing the all to use -j1 if there is no actual issue seems like it would slow build times with no gain?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a strong opinion but I think that we should choose safer option (disable make's parallel build for all dependencies) here.
Because we have a workaround (use -GNinja) for the parallel build problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have 2 cores available on cran (policy), no ninja (afaik) and our build times are already being criticized so....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have only 2 cores, we should not use parallel build for dependencies.
If we use parallel build, it will use 2 more cores in total and slow down total build time...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: We use make only for jemalloc, bzip2 and UCX.

Comment on lines 449 to 450
if(NOT ((_VERSION_ENTRY MATCHES "^[^ #][A-Za-z0-9-_]+_VERSION=")
OR (_VERSION_ENTRY MATCHES "^[^ #][A-Za-z0-9-_]+_CHECKSUM=")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are they needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were made by the cmake linter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. What CMake linter did you use? And what the error message from the CMake linter?
It seems that this diff changes matched pattern. (I think that it doesn't have any problem in our use case.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

archery lint --fix --cmake-format ^^ but I am happy to remove the change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm weird, it doesn't produce output if I leave the --fix flag off. I will revert the change to be safe.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jan 9, 2024
@assignUser assignUser requested a review from kou January 9, 2024 01:56
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines 449 to 450
if(NOT ((_VERSION_ENTRY MATCHES "^[^ #][A-Za-z0-9-_]+_VERSION=")
OR (_VERSION_ENTRY MATCHES "^[^ #][A-Za-z0-9-_]+_CHECKSUM=")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. What CMake linter did you use? And what the error message from the CMake linter?
It seems that this diff changes matched pattern. (I think that it doesn't have any problem in our use case.)

set(JEMALLOC_BUILD_COMMAND ${MAKE} ${MAKE_BUILD_ARGS})
# Paralleism for Make fails with CMake > 3.28 see #39517
if(${CMAKE_GENERATOR} MATCHES "Makefiles")
list(APPEND JEMALLOC_BUILD_COMMAND "-j1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a strong opinion but I think that we should choose safer option (disable make's parallel build for all dependencies) here.
Because we have a workaround (use -GNinja) for the parallel build problem.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Jan 9, 2024
This reverts commit 5f6051d.
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Jan 9, 2024
@assignUser assignUser merged commit 5acf67c into apache:main Jan 9, 2024
30 of 31 checks passed
@assignUser assignUser removed the awaiting changes Awaiting changes label Jan 9, 2024
raulcd pushed a commit that referenced this pull request Jan 9, 2024
…9522)

### Rationale for this change

With CMake > 3.28 the generated Makefile fails on the jemalloc_ep due to 'bad file descriptor'.

### What changes are included in this PR?

For a sequential build for jemalloc by setting -j1.

### Are these changes tested?

CI

### Are there any user-facing changes?

No.
* Closes: #39517

Authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
@pitrou
Copy link
Member

pitrou commented Jan 9, 2024

Isn't this making builds slower? Can it be circumvented to only CMake 3.28?
Also, is an issue open with the CMake project?

assignUser added a commit that referenced this pull request Jan 9, 2024
…9522)

### Rationale for this change

With CMake > 3.28 the generated Makefile fails on the jemalloc_ep due to 'bad file descriptor'.

### What changes are included in this PR?

For a sequential build for jemalloc by setting -j1.

### Are these changes tested?

CI

### Are there any user-facing changes?

No.
* Closes: #39517

Authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
@pitrou
Copy link
Member

pitrou commented Jan 10, 2024

Ping @raulcd @assignUser ^^

@assignUser
Copy link
Member Author

Isn't this making builds slower?

Tiny bit maybe but unlikely as the jemalloc build in itself is pretty quick + we only set the one build to -j1 so the cmake can still build other objects/external prjects in parallel.

Can it be circumvented to only CMake 3.28?

That would be possible via https://cmake.org/cmake/help/latest/variable/CMAKE_VERSION.html but also it only applies when using the Make Generator and most dev/ci are likely using Ninja...

Also, is an issue open with the CMake project?

No but I have that as a todo. I haven't found a working reprex yet, I guess it needs the many parallel external projects we have...

@pitrou
Copy link
Member

pitrou commented Jan 10, 2024

Tiny bit maybe but unlikely as the jemalloc build in itself is pretty quick + we only set the one build to -j1 so the cmake can still build other objects/external prjects in parallel.

My experience is that the jemalloc build blocks all other steps when building Arrow. I am not sure why that is (perhaps a bug in our dependency tree?); I tried outputting a dot file of the build graph using Ninja but I failed graphing the file afterwards...

Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 5acf67c.

There were 22 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

@assignUser
Copy link
Member Author

assignUser commented Jan 10, 2024

I build jemalloc with our release configure command and make vs make -j1:

  • ./test.sh 56.47s user 7.29s system 102% cpu 1:02.35 total
  • ./test.sh 57.22s user 7.05s system 103% cpu 1:02.16 total

So it seems that make defaults to a mostly single core build anyway, this just removes issues we had.

Explicitly passing -j16 to the jemalloc build interestingly also fixes the issue and does not show the warning mentioned in #2779. It might be enough to remove that change/conditional. I have tested this on ubuntu 20.04 with cmake 3.16 and it also works there. (I think cmake decoupled external projects from the main make by adding a cmake layer inbetween at some point prior to 3.16 thus getting rid of the sub-make issues).
I'll open a PR.

@kou
Copy link
Member

kou commented Jan 10, 2024

My experience is that the jemalloc build blocks all other steps when building Arrow.

It seems that we solved this by GH-5281. (But only with Ninja.)

@pitrou
Copy link
Member

pitrou commented Jan 10, 2024

Well, I use Ninja.

clayburn pushed a commit to clayburn/arrow that referenced this pull request Jan 23, 2024
…ct (apache#39522)

### Rationale for this change

With CMake > 3.28 the generated Makefile fails on the jemalloc_ep due to 'bad file descriptor'.

### What changes are included in this PR?

For a sequential build for jemalloc by setting -j1.

### Are these changes tested?

CI

### Are there any user-facing changes?

No.
* Closes: apache#39517

Authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ct (apache#39522)

### Rationale for this change

With CMake > 3.28 the generated Makefile fails on the jemalloc_ep due to 'bad file descriptor'.

### What changes are included in this PR?

For a sequential build for jemalloc by setting -j1.

### Are these changes tested?

CI

### Are there any user-facing changes?

No.
* Closes: apache#39517

Authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
…ct (apache#39522)

### Rationale for this change

With CMake > 3.28 the generated Makefile fails on the jemalloc_ep due to 'bad file descriptor'.

### What changes are included in this PR?

For a sequential build for jemalloc by setting -j1.

### Are these changes tested?

CI

### Are there any user-facing changes?

No.
* Closes: apache#39517

Authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R][C++] CMake 3.28.1 -GMake causes jemalloc subbuild to fail
3 participants