Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation plan for new Python package manager PDM #4128

Merged
merged 7 commits into from
Apr 19, 2024

Conversation

dhruvkb
Copy link
Member

@dhruvkb dhruvkb commented Apr 16, 2024

Fixes

Fixes #286 by @sarayourfriend

Description

This PR adds the implementation plan for migrating away from Pipenv. This plan

  • describes our must-have and good-to-have features in a new package manager
  • describes what options were considered, what features they had and lacked
  • describes why PDM was selected as our best option
  • describes why Rye did not meet our must-have criteria
  • describes why Poetry did not meet our good-to-have criteria
  • outlines the contents of the migration PR Switch API package management from Pipenv to PDM #4107

Testing Instructions

Read the added documentation page.

Decision-making process

This discussion is following the Openverse decision-making process. Information about this process can be found on the Openverse documentation site. Requested reviewers or participants will be following this process. If you are being asked to give input on a specific detail, you do not need to familiarise yourself with the process and follow it.

Current round

This discussion is currently in the Clarification round. The deadline for review of this round is 23 April 2024.

I've added Sara and Madison as reviewers because they were the folks who opened the issue and engaged the most actively in the discussion there. Other people who work on the Python side are welcome to voice their opinion as well.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@dhruvkb dhruvkb added 🟨 priority: medium Not blocking but should be addressed soon 🌟 goal: addition Addition of new feature 📄 aspect: text Concerns the textual material in the repository 🐍 tech: python Involves Python 🧱 stack: api Related to the Django API 🧱 stack: documentation Related to Sphinx documentation 🧭 project: implementation plan An implementation plan for a project labels Apr 16, 2024
@dhruvkb dhruvkb requested a review from a team as a code owner April 16, 2024 10:15
@openverse-bot openverse-bot added the 🤖 aspect: dx Concerns developers' experience with the codebase label Apr 16, 2024
Copy link

github-actions bot commented Apr 16, 2024

Full-stack documentation: https://docs.openverse.org/_preview/4128

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

New files ➕:

Copy link
Contributor

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I'm confused by some of the statements about Rye and PDM that conflict documentation or with my understanding of how Python packaging must work. But, I don't think is something we even need to spend much time on. If Poetry works (sounds like you've got it working in a PR), then it's not worth worrying about to me, especially if it's biggest drawback (non-compliance with PEP-621) is going to be resolved soon.

I still think Python distribution management is neat, and I hope we can use it for the places it matters. It does not matter for the API or ingestion server/workers, so not worth making decisions based on that now. We aren't locked in by any means, as you said, and poetry gets us a step closer to the flexibility we'd need to switch in the future if for some reason we needed to (I sure hope not!).

I'm approving this now, despite being the clarification round, because none of my clarifications are blockers, and even if Rye/PDM met all our criteria, if the motivation on our team is behind Poetry, I don't have any reason to stop it. There simply isn't anything about Poetry that would prevent us from using it reliably compared to other options.

[1 core maintainer](https://github.com/pdm-project/pdm/graphs/contributors)
- fewest GitHub stars and forks indicative of low popularity
- [flagship feature rejected](https://pdm-project.org/en/latest/usage/pep582/)
- [does not support editable main dependencies](https://pdm-project.org/en/latest/usage/dependency/#editable-dependencies)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are editable dependencies relevant outside a development context? You can't use them to build, you need to use a copied version.

The way this works is, you'd specify the editable package in dev dependencies and a non-editable version in the regular dependencies. That way, when you install in dev mode, the editable version is referenced, when you install in non-dev mode, the real copied package is used.

I've tried this locally and it works fine. Have I missed something we need from editable packages? I don't think we even want editable packages outside a dev context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this?

[project]
name = "api"

dependencies = [
  "ov-attribution @ file:///${PROJECT_ROOT}/../packages/ov-attribution",
]

[tool.pdm.dev-dependencies]
sub_pkgs = [
  "-e ov-attribution @ file:///${PROJECT_ROOT}/../packages/ov-attribution"
]

I tried locking a project with such a setup but pdm.lock only says editable=true.
Screenshot 2024-04-17 at 1 24 55 AM

Is that expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so. My assumption was that the lock file would indicate it as editable, but then only install it as such when you selected the dev dependencies with pdm install --dev. However, when I unselect the dev groun locally with pdm install --prod, it still installs it as editable.

But, I can pass --no-editable, and it works fine. pdm install --prod --no-editable copies the code directly into the venv, rather than creating a .pth file. The important bit being that we can install it editable in dev so that we can e.g., map the directory in docker compose to the relative location referenced.

For a future prod-only dependencies version of the api target in the API Dockerfile, we'd use pdm install --prod --no-editable --frozen-lockfile.

I wonder if the best way to do this would actually be to install the prod dependencies only in the Dockerfile, then add an entrypoint to the compose service that runs pdm install --dev to bootstrap the development environment in the container for local use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarayourfriend I think @krysal has approached that before to resolve #1008. Maybe something to reconsider after this transition.

Co-authored-by: sarayourfriend <24264157+sarayourfriend@users.noreply.github.com>
@dhruvkb
Copy link
Member Author

dhruvkb commented Apr 16, 2024

While you have approved it @sarayourfriend, I'm drafting it again to give PDM another shot based on your comment about having a dependency to appear twice, once as editable and another not editable.

@dhruvkb dhruvkb marked this pull request as draft April 16, 2024 20:58
@dhruvkb dhruvkb requested review from sarayourfriend and removed request for obulat April 16, 2024 21:02
@dhruvkb dhruvkb marked this pull request as ready for review April 17, 2024 15:35
@zackkrida
Copy link
Member

@dhruvkb could you update the PR description to reflect the switch from Poetry to PDM?

Comment on lines +204 to +205
These are fairly small steps and we can cover all of them in one PR
[like this](https://github.com/WordPress/openverse/pull/4107).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR just updates the API, which I agree is fine for a single PR. But we have a lot of other uses of Pipenv, like in utilities and the ingestion server. Can we create separate issues for those and do them in individual PRs, to keep things atomic and easy to review?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll make separate issues for them so that we can track progress of the migration and each PR will be small and atomic.

Co-authored-by: sarayourfriend <git@sarayourfriend.pictures>
Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite well-researched, and I'm glad that you were able to explore a number of different options @dhruvkb! Similar to @sarayourfriend's comments, I don't have much clarification and am fully onboard with pushing forward with PDM. Thank you for compiling all this information, I'm glad that the actual implementation will be pretty straightforward too 🙌🏼

Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
@dhruvkb dhruvkb changed the title Add implementation plan for new Python package manager Add implementation plan for new Python package manager PDM Apr 19, 2024
@dhruvkb dhruvkb merged commit b45b5ef into main Apr 19, 2024
38 checks passed
@dhruvkb dhruvkb deleted the new_pkg_man_ip branch April 19, 2024 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 📄 aspect: text Concerns the textual material in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧭 project: implementation plan An implementation plan for a project 🧱 stack: api Related to the Django API 🧱 stack: documentation Related to Sphinx documentation 🐍 tech: python Involves Python
Projects
Status: Accepted
Archived in project
Development

Successfully merging this pull request may close these issues.

Implementation Plan: Switch Python package management away from Pipenv
5 participants