Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cpython toolchains #618

Merged
merged 48 commits into from
Mar 9, 2022
Merged

feat: cpython toolchains #618

merged 48 commits into from
Mar 9, 2022

Conversation

f0rmiga
Copy link
Collaborator

@f0rmiga f0rmiga commented Feb 3, 2022

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature (please, look at the "Scope of the project" section in the README.md file)
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Other... Please describe:

Does this PR introduce a breaking change?

  • Yes
  • No

Other information

Fixes #293.

@UebelAndre
Copy link
Contributor

UebelAndre commented Feb 4, 2022

Would it be possible to add python 3.7 since it still appears to be within it's lifespan https://www.python.org/dev/peps/pep-0537/ ?

(also, thank you so much for working on this, this feature would be amazing!!)

@UebelAndre
Copy link
Contributor

It also seems there's Windows binaries available? Could those be added as well?

@UebelAndre
Copy link
Contributor

Also, it'd be awesome to have the binaries added to the google mirror

@f0rmiga
Copy link
Collaborator Author

f0rmiga commented Feb 4, 2022

@UebelAndre

  • Python 3.7 is only published for 3.7.9 in https://github.com/indygreg/python-build-standalone/releases/tag/20200822. I can see if I can get a more up-to-date patch built.
  • I don't have a Windows machine to test. I guess I can rely on CI for this. Still, I'd prefer to do it in a follow up PR.
  • Agree. I wonder if we should even go one step further and start building the binaries in our infrastructure.

@sikko-mode-25
Copy link

thanks for doing this! one thing i wanted to note - I tried using these toolchains on my machine, and found they don't work with --build_python_zip, it would be really nice if they did.

@thundergolfer
Copy link
Collaborator

and found they don't work with --build_python_zip, it would be really nice if they did.

@sikko-mode-25 would be good to get details of "don't work". What's the error?

Copy link
Collaborator

@thundergolfer thundergolfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff.

I think we should be sure to support Windows if we're gonna land this. I'm in progress on downloading a Windows VM (when will Github Codespaces support the Windows platform, eugh).

python/private/toolchains_repo.bzl Show resolved Hide resolved
python/private/versions.bzl Outdated Show resolved Hide resolved
integrity = TOOL_VERSIONS[version][rctx.attr.platform],
output = release_filename,
)
unzstd = rctx.which("unzstd")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bit that doesn't feel production-ready. We either inspect the host system for the needed binary, or download the source and compile it on the host. (Do Windows machines have make available by default?)

So if we don't want to adopt this non-hermeticity, what options do we have?

  • Try and land non-zstd tarball distributions in the python-standalone repo. Bazel natively supports regular tarballs.
    • Do we know if Gregory is against using it?
  • Get zstd support landed in Bazel. Support zstd decompression for external dependencies bazel#11968 is an old PR, but there's been activity from Googlers on it in the last 18 days (mid-Jan 2022) and it seems like it might be mergeable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no different than using gzip with tar. I don't see anyone worried about which version of gzip the system is using to decompress a .tar.gz file (at least not at this level).

I.e. if unzstd exits code 0, it successfully extracted the software we want - and that is deterministic. For Windows we can follow a similar approach to compiling it without Make if it's not supported on that system.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just FYI internally we compile a zstd cc_binary that we use to extract the python toolchain http download.

So making zstd binary configurable is very much desirable.

Let me know if you are interested in the BUILD file for zstd

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sluongng Interesting, it would change a bit when the files get extracted but I may be able to make it work? If you could share it, I'd appreciate it!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zstd decompression isn’t necessary since this issue was actioned, which added .tar.gz release artefacts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using tarballs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up @jheaff1, that's awesome.

Copy link
Collaborator

@thundergolfer thundergolfer Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the latest release publishes the 'install only' tarballs that we want, for Python 3.9 and 3.10, and each of those for Windows, Apple, and Linux.

e.g github.com/indygreg/python-build-standalone/releases/download/20211017/cpython-3.10.0-x86_64-apple-darwin-install_only-20211017T1616.tar.gz

@f0rmiga the 'install only' tarballs will remove a decent amount of complexity and non-hermeticity. Let's give them a try.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up that I'm getting hands on with the 'install only' tarballs over in thundergolfer/example-bazel-monorepo#54 and documenting my troubles.

@schultetwin
Copy link

Very cool. I use rules_python regularly and would love to see this.

As an FYI, I've (tried to) set-up a similar system and wanted to list a few cases here that we ran in to.

  1. These standalone versions of python still have system dependencies. See here for a full list. https://python-build-standalone.readthedocs.io/en/latest/running.html#runtime-requirements. In particular, on Linux you need libdl.so.2 in-order to use ctypes. I ended up with an auto-detection system used in the repository_rule that looks a bit like this:
       res = repository_ctx.execute([path_to_new_python_interpreter, "-c", "import ctypes; ctypes.CDLL('libdl.so.2')"])
        if res.return_code:
            fail("Failed to find libdl.so.2 installed on system.")

I'm sure other options exist too (including just documenting this case), but wanted to highlight it.

  1. On macOS, there is some dependency on XCode version that I haven't tracked down yet. When trying to compile c extensions in pip packages, if the wrong version of XCode is installed, we get an error like:
 Compiling with an SDK that doesn't seem to exist: /Applications/Xcode_12.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk

This one I haven't put in a solution for yet. But my next step is to look at the auto-configured x-code rules and see if I can detect if the proper version of Xcode is installed in the repository rule. Again, other options likely exist too, and maybe this is only worth documenting.

  1. Finally, it's worth linking to the behavior quirks page: https://python-build-standalone.readthedocs.io/en/latest/quirks.html. This lists the various gotchas that may occur when using this version of python.

Not asking for any changes in particular in this PR, but hope the above context is useful for gotchas with a similar system!

@f0rmiga
Copy link
Collaborator Author

f0rmiga commented Feb 9, 2022

@schultetwin thanks for the points!

  1. I think we can just document this behaviour, which is standard when loading dynamic objects at runtime.
  2. I'm not versed in XCode, but this sounds like something fixable from the user side? Probably downloading an extra SDK?
  3. I think it's worth extracting the behaviour quirks that apply to the binaries we picked. Not all of those quirks apply.


filegroup(
name = "files",
srcs = glob(["bin/**", "lib/**", "include/**", "share/**"]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be sure to exclude __pycache__ and *.pyc files as that may introduce cache busts in production systems.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow why the pre-compiled modules would introduce cache busts? Could you expand, please?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In older versions of Python the generated bytecode was non-deterministic, but given we are fetching prebuilt versions this may be a non-issue as they wouldn't change, unless Python will continually write new .pyc files here?
We should add a test against the SHA of these files perhaps.

Copy link
Collaborator

@aignas aignas Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was working on a python repo with bazel and debugging the action cache in order to understand why any test target was being rebuilt for the same git SHA I noticed, that during each github action run where I would do bazel test //... the tests were rerun because of .pyc and __pycache__ files included in the filegroup.

I used #551 (comment) as an inspiration.

This was with the most recent Python 3.9 build from indygreg.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what changed within those files? My guess is that the interpreter does indeed update these files again and they contain timestamps or absolute paths.

We should exclude RECORD files for the same reason then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find the logs right now, but my guess was that:

  1. We download the python interpreter files and extract them.
  2. We run pip_parse, which uses bin/python from the repository rule via the python_interpreter_target attribute.
  3. This causes extra .pyc files being created during the pip_parse
  4. Since the .pyc files are non-deterministic, we re-execute the Python test targets every time.

Having a minimal example should not be too difficult, but unfortunately I don't have time right now to help.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aignas could you try the latest state of this PR? I made the lib directory read-only, so it should be immutable. I wonder if you will still get the same error.

python/private/versions.bzl Outdated Show resolved Hide resolved
python/repositories.bzl Show resolved Hide resolved
python/repositories.bzl Outdated Show resolved Hide resolved
python/repositories.bzl Outdated Show resolved Hide resolved
@josh-newman
Copy link

Coincidentally, we recently started using indygreg/python-build-standalone's releases to define our Python toolchain (in @grailbio's internal repository). We hadn't yet had the time to make a PR here, so I'm excited to see this. Thanks @f0rmiga!

I've also experimented with using the artifacts from that standalone release to build self-contained, single-file Python executables (like .par, but also including the interpreter + stdlib). I've extracted a minimal example of this in grailbio/py_singlefile_binary; it defines a very slightly modified Python main, links with libpython, and uses itself as the import path because it is also a PEP 441 .zip.

Would you be open to expanding file visibility in the standalone Python repo's BUILD, to //visibility:public? Or even defining a libpython cc_library (example from our prototype) and stdlib.zip package there, for use cases like ours?

I'm also curious if there's interest in rules_python supporting self-contained executables in the future. Our interpreter+.zip approach works so far for some internal tools, though there may be better ways to accomplish this. @sluongng mentioned github.com/indygreg/PyOxidizer to me, for example. (Of course that'd be separate PRs/issues, I just thought I'd mention it here where there's context.)

python/repositories.bzl Outdated Show resolved Hide resolved
@f0rmiga
Copy link
Collaborator Author

f0rmiga commented Feb 11, 2022

@josh-newman thanks for the pointers too!

For PEP 441, we've considered a few options so far. Like you said, it's out of context for this PR but I can definitely export more things from the downloaded interpreter. I don't think the cc_library targets would hurt. I'll make the files publicly visible. The cc_library introduces some limitations around the directory structure that would make supporting arbitrary URL releases more challenging. I'm happy to revisit it in the near future after this PR gets merged.

@ewhauser
Copy link

@f0rmiga How are you dealing with the Python stub template depending on system Python? bazelbuild/bazel#8685 (comment)

This seems like an elegant solution for inside containers, but ideal for developers locally: https://github.com/buildbarn/bb-remote-execution/blob/master/cmd/fake_python/main.go

@thundergolfer thundergolfer removed the request for review from brandjon February 15, 2022 23:39
@thundergolfer
Copy link
Collaborator

I'm also curious if there's interest in rules_python supporting self-contained executables in the future.

@josh-newman it's something that would be great to provide a good default for, but building a good solution can be done in a separate repo first I think. rules_python also already has the baggage of the zip output. It would be preferable to have the official rules only support one way of doing things.

Thanks for open sourcing grailbio/py_singlefile_binary.

Copy link
Contributor

@UebelAndre UebelAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For greater visibility, I've got a branch that adds windows support and uses tar.gz files at #628 hopefully that helps 😄

python/private/toolchains_repo.bzl Outdated Show resolved Hide resolved
python/repositories.bzl Outdated Show resolved Hide resolved
python/private/toolchains_repo.bzl Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
python/repositories.bzl Show resolved Hide resolved
python/versions.bzl Show resolved Hide resolved
f0rmiga and others added 4 commits March 4, 2022 08:54
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Co-authored-by: UebelAndre <github@uebelandre.com>
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
README.md Outdated Show resolved Hide resolved
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
alexeagle referenced this pull request in digital-plumbers-union/rules_pyenv Mar 4, 2022
allow skipping py2 installation
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
python/repositories.bzl Outdated Show resolved Hide resolved
UebelAndre and others added 8 commits March 7, 2022 10:33
* Fix windows acceptance tests

* test

* todo: remove

Co-authored-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Copy link
Collaborator

@alexeagle alexeagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put some suggestions in #642
in particular, fixing the "target_compatible_with" that doesn't belong, matching bazel-contrib/rules-template#9

otherwise looks great to me!

alexeagle and others added 2 commits March 8, 2022 09:40
* Minor code review suggestions

* Apply suggestions from code review

Co-authored-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
Signed-off-by: Thulio Ferraz Assis <3149049+f0rmiga@users.noreply.github.com>
@f0rmiga f0rmiga merged commit bed8c1b into main Mar 9, 2022
@f0rmiga f0rmiga deleted the f0rmiga/cpython-toolchain branch March 9, 2022 18:33
@sluongng
Copy link

sluongng commented Mar 9, 2022

🎉 amazing work!

@aaliddell
Copy link
Contributor

Awesome 👍

@aignas
Copy link
Collaborator

aignas commented Mar 14, 2022

Thank you for the work. I tried to use it on my Mac and everything worked except for chmoding the extracted Python folder. Since I am the only one in here facing this issue, it could be something to do with my system, but wanted to give a heads up in case anyone else encounters this: bed8c1b#r68477007

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ship a repo rule to fetch a hermetic Python interpreter and generate a py_runtime / toolchain for it