Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Jan 3, 2026

Automated changes by create-pull-request GitHub action

@github-actions github-actions bot enabled auto-merge January 3, 2026 05:37
@couteau
Copy link
Contributor

couteau commented Jan 3, 2026

This seems to have been generated by merging my recent pull request. The one issue with this is the pyarrow version has to match the arrow version on the main vcpkg repo, which is currently 21.0 (though there is a pending pull request to update it to 22.0), so the suggested change to pyarrow version 22.0 should be rejected. One maintenance issue with that package will be keeping it synched with the arrow package when that pull request is eventually merged (and when future updates to the main arrow package are made).

The others are fine to merge -- I had actually updated to those versions of pandas, geopandas, and tzdata in a test branch that I forgot to merge into the main branch.

@m-kuhn
Copy link
Contributor

m-kuhn commented Jan 4, 2026

In consumer projects (QGIS) we will have both, a commit on the main vcpkg repository as well as a commit on this repository. This results in two arbitrary versions of arrow/pyarrow which are out of our control. If the two versions need to match, I would consider adding a check to the portfile of pyarrow so it will fail at build time (with a reasonable error message). Ideally we would equip the arrow package in microsoft/vcpkg with a new python feature. As far as I can tell, numpy is an optional dependency, so it may be possible to get that to work.

@m-kuhn
Copy link
Contributor

m-kuhn commented Jan 4, 2026

Meanwhile if pyarrow is fetched from github (and not from pythonhosted), it will be ignored by the update script.

@github-actions github-actions bot force-pushed the create-pull-request/patch branch from 27f0a7e to 01925d8 Compare January 5, 2026 05:50
@m-kuhn m-kuhn closed this Jan 5, 2026
auto-merge was automatically disabled January 5, 2026 08:02

Pull request was closed

@m-kuhn m-kuhn reopened this Jan 5, 2026
@m-kuhn
Copy link
Contributor

m-kuhn commented Jan 5, 2026

@couteau reconsidering this: I think we should add an arrow port here with a python feature to override the arrow port of the main repo until we are able to upstream this. We do the same for gdal and it cleanly solves the version sync problem.

@couteau
Copy link
Contributor

couteau commented Jan 5, 2026

Ok, I will work on that and submit a new pull request that deletes the separate pyarrow port.

@couteau
Copy link
Contributor

couteau commented Jan 6, 2026

@m-kuhn - I looked into this and it may not be so easy to backport an arrow port that includes pyarrow as a feature. Building pyarrow requires cython, setuptools, and setuptools-scm, none of which is available upstream. In addition, even though the numpy dependency is optional, enabling it requires numpy at build time, not just at runtime.

One thought is that the arrow source tree includes the pyarrow source. If we could use that to build pyarrow instead of redownloading the pyarrow source from pythonhosted, it would resolve the ensure pyarrow is always build for the correct version of arrow, as well as solving the update script issue. But I don't see any documented way of accessing another package's source tree, and doing it that way may create other issues (e.g., it doesn't resolve the problem that the version numbers for arrow and pyarrow in their respective vcpkg.json manifests may not match, even if the versions actually installed do).

We can still pull arrow into the python registry and create a python feature for it, but we may not be able to upstream it unless the entire python registry is upstreamed.

@github-actions github-actions bot force-pushed the create-pull-request/patch branch from 01925d8 to 499f773 Compare January 6, 2026 05:42
@github-actions github-actions bot enabled auto-merge January 6, 2026 05:42
@m-kuhn
Copy link
Contributor

m-kuhn commented Jan 6, 2026

Thanks for looking into this.

Build time dependencies can be fetched using host python, see e.g. https://github.com/microsoft/vcpkg/blob/e3db8f65d2414c301c29a8467c6aee94e3ba09fc/ports/libcamera/portfile.cmake#L11-L15

This still leaves us with the question about numpy, as far as I can see arrow links to the native modules of numpy and doesn't just use it as a build tool so we will need a vcpkg built version of that which in turn would require this to be continued microsoft/vcpkg#37409. In the long run, this should be the goal.

Reusing a source of another package is not something that can be done afaik. A port can fetch a cached version of its dependencies at build time and this only contains installed the installed parts.

@github-actions github-actions bot force-pushed the create-pull-request/patch branch from 499f773 to 62b960c Compare January 7, 2026 05:42
@couteau
Copy link
Contributor

couteau commented Jan 7, 2026

What do you think the short-term solution is, since it doesn't look like microsoft/vcpkg#37409 is going to be merged any time soon? Seems like we have two options:

  1. Keep the py-pyarrow port as is as a separate package and update as soon as possible whenever the main arrow package is updated?
  2. Add an arrow port here with a python feature that builds pyarrow with the hope of backporting it to the main vcpkg repo at some point.

If we want to go with option 2, I've implemented this, and using the x_vcpkg_get_python_packages function used in the example you linked (thanks!) to add the python dependencies (including numpy) and building and installing the wheel without using the functions in the vcpkg-python-scripts port, it has no dependencies on any other python-registry ports, so it could be upstreamed. It does, of course, depend on python, but there is a python port in the main repo, and if python is installed from the python-registry repo, it will use that to build pyarrow.

This still leaves us with the question about numpy

Yes -- it looks like numpy is a build-time requirement, even though it is optional at runtime.

@m-kuhn
Copy link
Contributor

m-kuhn commented Jan 7, 2026

I think we should go with option 2 as this will release us from the pain to align two port versions.

using the x_vcpkg_get_python_packages function used in the example you linked (thanks!) to add the python dependencies (including numpy)

Are you confident that it will run without numpy at runtime (or with a different version of numpy at runtime than at build time)?
If this is the case, we might as well open a pull request upstream and there's no need to wait for an official numpy port.
Meanwhile I'll be happy to also merge the same port into this repository.

@couteau
Copy link
Contributor

couteau commented Jan 7, 2026

I just ran some basic tests -- pyarrow wheel built using vcpkg with numpy 2.4.0 (the version installed in the build venv by the x_vcpkg_get_python_packages function) imports and runs fine without any numpy installed, and also runs fine using numpy 1.26.4 (the latest available on the python-registry). I tested the basic functionality, and it works. E.g., the following snippet from the pyarrow website works as expected.

>>> import pyarrow as pa
>>> arr = pa.array([4, 5, 6], type=pa.int32())
>>> view = arr.to_numpy()
>>> view
array([4, 5, 6], dtype=int32)

In fact, that test is essentially implicitly run in the build process, because the venv used to build the package uses whatever the latest version of numpy on PyPi is (currently 2.4.0), and the import test is run using the vcpkg installed python, which has whatever version of numpy is installed from vcpkg. Importing pyarrow when numpy is installed automatically imports numpy.

I'll open a pull request here to add the arrow port.

@github-actions github-actions bot force-pushed the create-pull-request/patch branch from 62b960c to 020a3d8 Compare January 8, 2026 05:42
@m-kuhn
Copy link
Contributor

m-kuhn commented Jan 8, 2026

It's probably worth updating the numpy port in here too since there was even a major version bump. It was originally created by @Neumann-A in https://github.com/Neumann-A/my-vcpkg-ports/tree/master/numpy a long time ago and hasn't seen much work since.

@github-actions github-actions bot force-pushed the create-pull-request/patch branch 2 times, most recently from fd61066 to 2b3923f Compare January 10, 2026 05:36
@github-actions github-actions bot force-pushed the create-pull-request/patch branch from 2b3923f to 6b14039 Compare January 11, 2026 05:41
@m-kuhn m-kuhn closed this Jan 11, 2026
auto-merge was automatically disabled January 11, 2026 06:45

Pull request was closed

@m-kuhn m-kuhn reopened this Jan 11, 2026
@m-kuhn m-kuhn merged commit bf4d0b4 into main Jan 11, 2026
4 checks passed
@m-kuhn m-kuhn deleted the create-pull-request/patch branch January 11, 2026 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants