-
Notifications
You must be signed in to change notification settings - Fork 267
Add Apache Arrow wrap #2099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Apache Arrow wrap #2099
Conversation
|
Should this wrap be called |
|
That sounds like a reasonable suggestion - @kou what do you think? |
156c0e3 to
57b9777
Compare
|
Is there a way to control how this package will pull in other wrap files? For example, the Arrow test suite requires a very specific version of flatbuffers for compilation (this is a flatbuffers requirement, not Arrow) From what I see in the failing Ubuntu builds, the version of flatbuffers that is specified in arrow/cpp/subprojects/flatbuffers.wrap may not be the version used in CI |
|
CI will use the flatbuffers wrap from this repo if When the Arrow wrap is actually used in downstream projects, those projects may ship some random version of the flatbuffers wrap, not the version Arrow expects. In any case, downstream projects don't control what flatbuffers package is installed on the end user's system. |
Ah OK - in that case I will disable the tests The problem is that Arrow ships files that are generated by a specific version of flatbuffers. Flatbuffers requires and statically asserts that the exact same version that was used to generate those files is subsequently used to compile them; that invariant would be almost impossible to hold if relying on the system or other projects to provide that dependency It may be possible to generate those files during compilation instead of including in the source tree upstream; that is something I can investigate further |
4b6c9df to
9204544
Compare
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this wrap be called
arrow-cpp? There is also thec_glibone, and prospectively e.g.rustin the future.That sounds like a reasonable suggestion - @kou what do you think?
+1
d02725f to
45d6be8
Compare
|
IIUC with the Windows failures, the project is marked as For the macOS failure, it looks like x86 is fine but arm64 fails. Is there something extra I need to do in the current configuration to install brew on arm64, or should I just skip macOS altogether? |
|
The macOS problem is likely a search path issue in Arrow's Meson config. Homebrew installs into If Windows builds are intended to work but are buggy right now, I'd recommend dropping |
|
Thanks that info is helpful. At this point it might even be best to wait until the next release of Arrow, which in theory should fix the unnecessary boost dependency |
|
Apache Arrow 21.0.0 was released a few days ago, so this is ready for real review now |
|
The windows failures here are expected, as the upstream configuration is not tested against that yet |
|
@bgilbert no rush but let me know if you have any feedback on this in its current state. Thanks! |
|
@eli-schwartz do you have any thoughts on this one? |
|
Given your comment in #2206 (comment) I think we need to change the dependency name here to just |
|
Yes. Projects need to do arrow_cpp_dep = dependency('arrow')in order to pick up their system installed distro copy of Apache Arrow. In order to also pick up the subproject wrap, the subproject |
I would say, merge this with failing CI and assume that the upstream code will get to fixing this eventually -- I assume it's already the plan, right? |
Got it - this should be updated now
Correct. The upstream issue to track this is apache/arrow#46797 |
b585dd6 to
51f265d
Compare
|
Good to merge? |
d969935 to
f674e94
Compare
| source_hash = 5d3f8db7e72fb9f65f4785b7a1634522e8d8e9657a445af53d4a34a3849857b5 | ||
| patch_directory = arrow-cpp | ||
|
|
||
| [provide] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually a few more provides that we can add here, but I think they are broken in the 21.0.0 release of Arrow (mostly due to my poor understanding of pkg-config)
- arrow-acero: Upstream pkg-config incorrectly excludes the acero library
- arrow-csv: Only exposes header file(s). Need to add arrow_csv_dep upstream
- arrow-filesystem: Only exposes header file(s). Need to add arrow_filesystem_dep upstream
- arrow-json: Only exposes header file(s). Need to add arrow_json_dep upstream
- arrow-tensorflow: Only exposes header file(s). Need to add arrow_tensorflow_dep upstream
FWIW most of these (and also the current arrow-compute dependency) are not built by default, so a user would have to provide the appropriate argument(s) to a dependency fallback. I think that is as intended with the wrap system though (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW most of these (and also the current arrow-compute dependency) are not built by default, so a user would have to provide the appropriate argument(s) to a dependency fallback. I think that is as intended with the wrap system though (?)
Yes, if the subproject fails to provide the right overridden dependency then the dependency lookup will simply return not-found. The dependency fallback should be configured with the right options (can be automatic via default_options).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks. I'll chip away at the individual issues upstream for the next release. For now hopefully not a blocker for this (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK great - I've opened apache/arrow#47431 upstream to track that (and a few other items I've gotten feedback on)
0d19dbc to
71312d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does upstream intend never to put a meson.build in its repository root, so that WrapDB will be a permanent mirror for upstream's meson.build and meson.options with adjusted subdir paths? If so, we need to document that in a comment at the top of this file, preferably with a sketch of the necessary changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point it seems really unlikely, at least in the near term. The Arrow project is a monorepo with a few different languages it in, some of which are not buildable by Meson. In the past the team has been hesitant to put anything in the root that only serves some of the languages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is extremely reasonable for a monorepo with a few different languages in it, to have each language use a different root buildfile. When you build apache-arrow, is it supposed to build arrow-cpp, or build a pyarrow wheel?
But one thing that such projects often do, is release individual release distfiles for each language directory (assuming they work standalone).
$ du -sh /var/cache/distfiles/apache-arrow-21.0.0.tar.gz
17M /var/cache/distfiles/apache-arrow-21.0.0.tar.gz
$ du -sh /var/tmp/portage/dev-libs/apache-arrow-21.0.0/work/apache-arrow-21.0.0/cpp
33M /var/tmp/portage/dev-libs/apache-arrow-21.0.0/work/apache-arrow-21.0.0/cpp
$ du -sh /var/tmp/portage/dev-libs/apache-arrow-21.0.0/work/apache-arrow-21.0.0/
73M /var/tmp/portage/dev-libs/apache-arrow-21.0.0/work/apache-arrow-21.0.0/
The download of apache arrow is not all that small, but in all likelihood I don't need most of it. If the cpp/ directory had its own asset file, that could be used as the wrap-file url, and would be extracted on its own to subprojects/apache-arrow-cpp-21.0.0/ with a meson.build in the root.
But since apache arrow doesn't provide per-language distfiles, we need this simple workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its definitely something to work towards; there is a lot of legacy to the release process, a lot of which I think is tied to some dirty CMake hacks. I'd love to see Meson grow more in usage in the repo, starting with the C++ source, then moving into the Python bindings, and then whatever else from there. Along the way, there's going to be a lot that I think we find we can do better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current arrangement is okay as a starting point, we just need to make sure we document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add some comments to the meson.build config file here if that is what you have in mind. Or is there a better place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Top of the meson.build is fine.
6c826bd to
91a1799
Compare
bgilbert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Work to make this more feature complete is still ongoing upstream in the Arrow repository, but I think the base dependency is usable enough to get started on a wrapdb entry