Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation plan for new Python package manager PDM #4128

Merged
merged 7 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# 2024-04-12 Implementation Plan: Switching to a new Python package manager

**Author**: @dhruvkb

<!-- See the implementation plan guide for more information: https://github.com/WordPress/openverse/tree/19791f51c063d0979112f4b9f4eeace04c8cf5ff/docs/projects#implementation-plans-status-in-rfc -->
<!-- This template is exhaustive and may include sections which aren't relevant to your project. Feel free to remove any sections which would not be useful to have. -->

## Reviewers

<!-- Choose two people at your discretion who make sense to review this based on their existing expertise. Check in to make sure folks aren't currently reviewing more than one other proposal or RFC. -->

- [ ] @sarayourfriend
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved
- [ ] @AetherUnbound
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved

## Overview

<!-- An overview of the implementation plan, if necessary. Save any specific steps for the section(s) below. -->

This implementation plan compares the current best tools for managing Python
dependencies and packages and proposes a plan to switch to a new package manager
determined by the comparison.

## Expected Outcomes

At the end of the implementation of this plan, we should have the Openverse API
project using a new Python package manager. The new package manager will be
integrated with our existing integrations and will be behind the Docker builds
for the API being run in staging and production.

Other parts of the Openverse project can also be migrated to the new package
manager based on the success of this implementation, but they should not be
considered as success criteria for this project.

## Goals

These are criteria for choosing a package manager that would make an option
ineligible if not met. The package manager must

- **support apps and libraries**

We want to build Python libraries as publish to PyPI, as well as applications
like the API and ingestion server. Our package manager has to suitable for
both purposes.

- **support internal dependencies**

Following up on the last point, our tools package manager must allow our apps
to depend on libraries, either as part of their monorepo support or as path
dependencies. This should also work reliably inside Docker.

- **support targeted updates**

This is one of the main reasons we are switching away from Pipenv so the
alternative has to support this requirement. Updating one package should not
update any unrelated packages.

- **integrate with auto-updaters**

Following up on the last point, we use Renovate for automatically updating
packages periodically, and also Dependabot for immediate security updates.
These tools support popular package managers and those that adhere to PEP 621
(see [Renovate](https://docs.renovatebot.com/modules/manager/pep621/) and
[Dependabot](https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/about-dependabot-version-updates#pip-and-pip-compile)).

### Additional good-to-haves

These are not deal-breaking criteria but would be nice to have. The package
manager should

- **be fast**

The package manager should quickly resolve dependencies and should also make
downloading and installing packages faster using caching similar to pnpm.

- **have ease of use**

This is somewhat subjective, but the package manager should be fairly easy to
use, should not have too much magic, and should make Python workflows like
building Docker images and publishing packages to PyPI easier.

- **comply with standards**

The package manager should not reinvent the wheel and should follow the
standards like PEP 621, PEP 517 and PEP 508. This is important for future
integrations and compatibility with other tools.

- **be around for the long term**

The package manager should be actively developed and maintained, and should
have a good community around it. This is important for long-term support and
active maintenance in the future.

## Decision making

To decide the new package manager, many options were considered. Here are the
options, what benefits they provided and why they were ultimately not chosen as
the path forward.

![](./star_history.png)

### [Rye](https://rye-up.com/)

#### Why?

- all-in-one tool (at least trying to be)
- provisions
[@indygreg's python-build-standalone](https://github.com/indygreg/python-build-standalone)
- fastest among all options considered (written in Rust, uses
[`uv`](https://github.com/astral-sh/uv/) under the hood)
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved
- lockfile is `requirements.txt`
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved
- supports monorepo workspaces
- developed by [Armin Ronacher](https://github.com/mitsuhiko)
- fastest growing in terms of GitHub stars
- supports multiple build backends via
[PyPA/build](https://build.pypa.io/en/stable/)

#### Why not?

- PEP 621 non-compliant
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved
- looks like
[1 core maintainer](https://github.com/astral-sh/rye/graphs/contributors)
- [no stable v1 yet](https://github.com/astral-sh/rye/releases)
- [considered "experimental"](https://github.com/search?q=repo:astral-sh/rye%20experimental&type=code)
- [VC-backed](https://astral.sh/about)
- [not supported by Renovate](https://github.com/renovatebot/renovate/issues/25273)
[⚠️ deal-breaker]
- does not generate platform-independent lockfiles

### [PDM](https://pdm-project.org/)

#### Why?

- PEP 621 compliant
- supported by [Renovate](https://docs.renovatebot.com/modules/manager/pep621/)
and
[Dependabot](https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/about-dependabot-version-updates#pip-and-pip-compile)
- supports multiple build backends
- supports monorepo workspaces
- provisions
[@indygreg's python-build-standalone](https://github.com/indygreg/python-build-standalone)
- [endorsed](https://github.com/astral-sh/rye/discussions/6#discussioncomment-5700997)
by creator of Rye!

#### Why not?

- looks like
[1 core maintainer](https://github.com/pdm-project/pdm/graphs/contributors)
- fewest GitHub stars and forks indicative of low popularity
- [flagship feature rejected](https://pdm-project.org/en/latest/usage/pep582/)
- [does not support editable main dependencies](https://pdm-project.org/en/latest/usage/dependency/#editable-dependencies)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are editable dependencies relevant outside a development context? You can't use them to build, you need to use a copied version.

The way this works is, you'd specify the editable package in dev dependencies and a non-editable version in the regular dependencies. That way, when you install in dev mode, the editable version is referenced, when you install in non-dev mode, the real copied package is used.

I've tried this locally and it works fine. Have I missed something we need from editable packages? I don't think we even want editable packages outside a dev context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this?

[project]
name = "api"

dependencies = [
  "ov-attribution @ file:///${PROJECT_ROOT}/../packages/ov-attribution",
]

[tool.pdm.dev-dependencies]
sub_pkgs = [
  "-e ov-attribution @ file:///${PROJECT_ROOT}/../packages/ov-attribution"
]

I tried locking a project with such a setup but pdm.lock only says editable=true.
Screenshot 2024-04-17 at 1 24 55 AM

Is that expected?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so. My assumption was that the lock file would indicate it as editable, but then only install it as such when you selected the dev dependencies with pdm install --dev. However, when I unselect the dev groun locally with pdm install --prod, it still installs it as editable.

But, I can pass --no-editable, and it works fine. pdm install --prod --no-editable copies the code directly into the venv, rather than creating a .pth file. The important bit being that we can install it editable in dev so that we can e.g., map the directory in docker compose to the relative location referenced.

For a future prod-only dependencies version of the api target in the API Dockerfile, we'd use pdm install --prod --no-editable --frozen-lockfile.

I wonder if the best way to do this would actually be to install the prod dependencies only in the Dockerfile, then add an entrypoint to the compose service that runs pdm install --dev to bootstrap the development environment in the container for local use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarayourfriend I think @krysal has approached that before to resolve #1008. Maybe something to reconsider after this transition.

[⚠️ deal-breaker]

### [Poetry](https://python-poetry.org/)

#### Why?

- stable (post v1) and mature (6+ years old)
- ease of use due to being ideologically close to Pipenv
- supports libraries (build and publish to PyPI) as well as apps
- generates cross-platform lockfiles
- good CLI experience for developers
- highest GitHub stars and forks indicative of high popularity
- looks like
[2 core maintainers](https://github.com/astral-sh/rye/graphs/contributors)
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved
(> 1)
- supported by
[Renovate](https://docs.renovatebot.com/modules/manager/poetry/)[^poetry_reno]
and
[Dependabot](https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#package-ecosystem)
- [challenged non-endorsement](https://github.com/astral-sh/rye/discussions/6#discussioncomment-5702537)
by creator of Rye

[^poetry_reno]:
Renovate is not 100% compatible with Poetry and recommends pinning
dependencies in `pyproject.toml` instead of using ranges, although their
justification for this is very weak.
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved

#### Why not?

- [PEP 621 non-compliant](https://github.com/python-poetry/roadmap/issues/3)
- only supports
[one build backend](https://github.com/python-poetry/poetry-core)
- does not support monorepo workspaces
- despite cache, not as fast as other alternatives

### Conclusions

Poetry is the only one that does not have a deal-breaker for us. Hence it was
selected as the next package manager of choice.

Regarding PEP non-compliance, the maintainers have confirmed that PEP 621 will
be supported in v2.x. As for only supporting one build backend, that backend is
capable enough for us and does what we need in terms of building libraries.

Also, both PDM and Rye are currently actively developed. If their respective
deal-breakers are resolved, they can be re-considered in the future. I have good
hopes for Rye to be a strong contender in the future.

## Step-by-step plan

1. Replace Pipenv with Poetry.
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved
2. Update Renovate configuration.
3. Update `Dockerfile`s.
4. Update `just` recipes.

These are fairly small steps and we can cover all of them in one PR
[like this](https://github.com/WordPress/openverse/pull/4107).
Comment on lines +204 to +205
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR just updates the API, which I agree is fine for a single PR. But we have a lot of other uses of Pipenv, like in utilities and the ingestion server. Can we create separate issues for those and do them in individual PRs, to keep things atomic and easy to review?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll make separate issues for them so that we can track progress of the migration and each PR will be small and atomic.

8 changes: 8 additions & 0 deletions documentation/projects/proposals/python_packaging/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Python packaging and code sharing

```{toctree}
:titlesonly:
:glob:

*
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading