Skip to content

Conversation

@WillAyd
Copy link
Contributor

@WillAyd WillAyd commented May 8, 2025

Work to make this more feature complete is still ongoing upstream in the Arrow repository, but I think the base dependency is usable enough to get started on a wrapdb entry

@bgilbert
Copy link
Collaborator

bgilbert commented May 8, 2025

Should this wrap be called arrow-cpp? There is also the c_glib one, and prospectively e.g. rust in the future.

@WillAyd
Copy link
Contributor Author

WillAyd commented May 8, 2025

That sounds like a reasonable suggestion - @kou what do you think?

@WillAyd WillAyd force-pushed the add-arrow branch 4 times, most recently from 156c0e3 to 57b9777 Compare May 8, 2025 14:05
@WillAyd
Copy link
Contributor Author

WillAyd commented May 8, 2025

Is there a way to control how this package will pull in other wrap files? For example, the Arrow test suite requires a very specific version of flatbuffers for compilation (this is a flatbuffers requirement, not Arrow)

From what I see in the failing Ubuntu builds, the version of flatbuffers that is specified in arrow/cpp/subprojects/flatbuffers.wrap may not be the version used in CI

@bgilbert
Copy link
Collaborator

bgilbert commented May 8, 2025

CI will use the flatbuffers wrap from this repo if ci_config.json doesn't install flatbuffers as a distro package. It looks like WrapDB has an old version of flatbuffers, which could certainly be updated to the current version. If Arrow expects a specific older version, that isn't a reasonable expectation and you'd need to disable tests here.

When the Arrow wrap is actually used in downstream projects, those projects may ship some random version of the flatbuffers wrap, not the version Arrow expects. In any case, downstream projects don't control what flatbuffers package is installed on the end user's system.

@WillAyd
Copy link
Contributor Author

WillAyd commented May 8, 2025

If Arrow expects a specific older version, that isn't a reasonable expectation and you'd need to disable tests here.

Ah OK - in that case I will disable the tests

The problem is that Arrow ships files that are generated by a specific version of flatbuffers. Flatbuffers requires and statically asserts that the exact same version that was used to generate those files is subsequently used to compile them; that invariant would be almost impossible to hold if relying on the system or other projects to provide that dependency

It may be possible to generate those files during compilation instead of including in the source tree upstream; that is something I can investigate further

Copy link

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this wrap be called arrow-cpp? There is also the c_glib one, and prospectively e.g. rust in the future.

That sounds like a reasonable suggestion - @kou what do you think?

+1

@WillAyd
Copy link
Contributor Author

WillAyd commented May 9, 2025

IIUC with the Windows failures, the project is marked as build_on: {'windows': false} but configures correctly, so the CI is failing as a result? Should I just raise an error in the patch configuration when Windows is used? It seems as if the symbol export is not quite right, so it cannot compile on Windows platforms.

For the macOS failure, it looks like x86 is fine but arm64 fails. Is there something extra I need to do in the current configuration to install brew on arm64, or should I just skip macOS altogether?

@bgilbert
Copy link
Collaborator

bgilbert commented May 9, 2025

The macOS problem is likely a search path issue in Arrow's Meson config. Homebrew installs into /usr/local on x86_64 but /opt/local on arm64, so on x86 you can get away with the default include paths but on ARM you need to correctly pass Boost's include path to the compiler.

If Windows builds are intended to work but are buggy right now, I'd recommend dropping build_on and just letting CI fail. We don't require all CI to pass here before merging, provided that the failure is understood and not a regression.

@WillAyd
Copy link
Contributor Author

WillAyd commented May 9, 2025

Thanks that info is helpful. At this point it might even be best to wait until the next release of Arrow, which in theory should fix the unnecessary boost dependency

@WillAyd WillAyd marked this pull request as ready for review July 21, 2025 20:30
@WillAyd
Copy link
Contributor Author

WillAyd commented Jul 21, 2025

Apache Arrow 21.0.0 was released a few days ago, so this is ready for real review now

@WillAyd
Copy link
Contributor Author

WillAyd commented Jul 22, 2025

The windows failures here are expected, as the upstream configuration is not tested against that yet

@WillAyd
Copy link
Contributor Author

WillAyd commented Jul 28, 2025

@bgilbert no rush but let me know if you have any feedback on this in its current state. Thanks!

@WillAyd
Copy link
Contributor Author

WillAyd commented Aug 4, 2025

@eli-schwartz do you have any thoughts on this one?

@WillAyd
Copy link
Contributor Author

WillAyd commented Aug 4, 2025

Given your comment in #2206 (comment) I think we need to change the dependency name here to just arrow and not arrow-cpp right?

@eli-schwartz
Copy link
Member

Yes. Projects need to do

arrow_cpp_dep = dependency('arrow')

in order to pick up their system installed distro copy of Apache Arrow. In order to also pick up the subproject wrap, the subproject *.wrap file can be named anything but its [provide] section needs to override arrow.pc, not arrow-cpp.pc (as the latter doesn't exist).

@eli-schwartz
Copy link
Member

The windows failures here are expected, as the upstream configuration is not tested against that yet

I would say, merge this with failing CI and assume that the upstream code will get to fixing this eventually -- I assume it's already the plan, right?

@WillAyd
Copy link
Contributor Author

WillAyd commented Aug 4, 2025

In order to also pick up the subproject wrap, the subproject *.wrap file can be named anything but its [provide] section needs to override arrow.pc, not arrow-cpp.pc (as the latter doesn't exist).

Got it - this should be updated now

I would say, merge this with failing CI and assume that the upstream code will get to fixing this eventually -- I assume it's already the plan, right?

Correct. The upstream issue to track this is apache/arrow#46797

@WillAyd WillAyd force-pushed the add-arrow branch 2 times, most recently from b585dd6 to 51f265d Compare August 4, 2025 16:50
@WillAyd
Copy link
Contributor Author

WillAyd commented Aug 11, 2025

Good to merge?

@WillAyd WillAyd force-pushed the add-arrow branch 2 times, most recently from d969935 to f674e94 Compare August 19, 2025 20:06
source_hash = 5d3f8db7e72fb9f65f4785b7a1634522e8d8e9657a445af53d4a34a3849857b5
patch_directory = arrow-cpp

[provide]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually a few more provides that we can add here, but I think they are broken in the 21.0.0 release of Arrow (mostly due to my poor understanding of pkg-config)

  • arrow-acero: Upstream pkg-config incorrectly excludes the acero library
  • arrow-csv: Only exposes header file(s). Need to add arrow_csv_dep upstream
  • arrow-filesystem: Only exposes header file(s). Need to add arrow_filesystem_dep upstream
  • arrow-json: Only exposes header file(s). Need to add arrow_json_dep upstream
  • arrow-tensorflow: Only exposes header file(s). Need to add arrow_tensorflow_dep upstream

FWIW most of these (and also the current arrow-compute dependency) are not built by default, so a user would have to provide the appropriate argument(s) to a dependency fallback. I think that is as intended with the wrap system though (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW most of these (and also the current arrow-compute dependency) are not built by default, so a user would have to provide the appropriate argument(s) to a dependency fallback. I think that is as intended with the wrap system though (?)

Yes, if the subproject fails to provide the right overridden dependency then the dependency lookup will simply return not-found. The dependency fallback should be configured with the right options (can be automatic via default_options).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks. I'll chip away at the individual issues upstream for the next release. For now hopefully not a blocker for this (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK great - I've opened apache/arrow#47431 upstream to track that (and a few other items I've gotten feedback on)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does upstream intend never to put a meson.build in its repository root, so that WrapDB will be a permanent mirror for upstream's meson.build and meson.options with adjusted subdir paths? If so, we need to document that in a comment at the top of this file, preferably with a sketch of the necessary changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point it seems really unlikely, at least in the near term. The Arrow project is a monorepo with a few different languages it in, some of which are not buildable by Meson. In the past the team has been hesitant to put anything in the root that only serves some of the languages

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is extremely reasonable for a monorepo with a few different languages in it, to have each language use a different root buildfile. When you build apache-arrow, is it supposed to build arrow-cpp, or build a pyarrow wheel?

But one thing that such projects often do, is release individual release distfiles for each language directory (assuming they work standalone).

$ du -sh /var/cache/distfiles/apache-arrow-21.0.0.tar.gz 
17M     /var/cache/distfiles/apache-arrow-21.0.0.tar.gz
$ du -sh /var/tmp/portage/dev-libs/apache-arrow-21.0.0/work/apache-arrow-21.0.0/cpp
33M     /var/tmp/portage/dev-libs/apache-arrow-21.0.0/work/apache-arrow-21.0.0/cpp
$ du -sh /var/tmp/portage/dev-libs/apache-arrow-21.0.0/work/apache-arrow-21.0.0/
73M     /var/tmp/portage/dev-libs/apache-arrow-21.0.0/work/apache-arrow-21.0.0/

The download of apache arrow is not all that small, but in all likelihood I don't need most of it. If the cpp/ directory had its own asset file, that could be used as the wrap-file url, and would be extracted on its own to subprojects/apache-arrow-cpp-21.0.0/ with a meson.build in the root.

But since apache arrow doesn't provide per-language distfiles, we need this simple workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its definitely something to work towards; there is a lot of legacy to the release process, a lot of which I think is tied to some dirty CMake hacks. I'd love to see Meson grow more in usage in the repo, starting with the C++ source, then moving into the Python bindings, and then whatever else from there. Along the way, there's going to be a lot that I think we find we can do better

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current arrangement is okay as a starting point, we just need to make sure we document it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add some comments to the meson.build config file here if that is what you have in mind. Or is there a better place?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top of the meson.build is fine.

@WillAyd WillAyd force-pushed the add-arrow branch 3 times, most recently from 6c826bd to 91a1799 Compare August 27, 2025 16:36
Copy link
Collaborator

@bgilbert bgilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bgilbert bgilbert merged commit 9757e32 into mesonbuild:master Aug 29, 2025
11 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants