Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch API package management from Pipenv to PDM #4107

Merged
merged 17 commits into from
Apr 23, 2024
Merged

Switch API package management from Pipenv to PDM #4107

merged 17 commits into from
Apr 23, 2024

Conversation

dhruvkb
Copy link
Member

@dhruvkb dhruvkb commented Apr 14, 2024

Related

Related to #286. Additional background can be read in this comment.

Description

This PR replaces Pipenv with PDM. While PDM has monorepo support, we have opted not to use this feature in this PR. Instead we will use a path dependency to depend on subpackages located at packages/ in the monorepo. To see this in action, the license and attribution related functionality in the API has been separated into its own package at packages/ov-attribution.

The ov-attribution subpackage is listed as a dependency as well as an editable dev-dependency of the API. This means that in dev mode, the package will not actually be copied into the virtualenv but rather only point to ../packages/ov-attribution. When installed with --no-editable it will be copied into the virtualenv. Thanks @sarayourfriend for suggesting this and making usage of PDM possible.

Testing Instructions

  1. Install PDM. I recommend using the pipx approach, I am too.
  2. Build the new Docker image with just dc build web.
  3. Explore the editable local package by changing something in py-packages/openverse-attribution to see the change immediately in the API.
  4. Run the tests. (Or don't, CI has you covered.)
    • Test openverse-attribution with:
    cd py-packages/openverse-attribution && pdm install && pdm run pytest`
    
    • Test the API with just api/test.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@github-actions github-actions bot added 🧱 stack: api Related to the Django API 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: documentation Related to Sphinx documentation 🧱 stack: frontend Related to the Nuxt frontend 🧱 stack: ingestion server Related to the ingestion/data refresh server 🧱 stack: mgmt Related to repo management and automations labels Apr 14, 2024
@openverse-bot openverse-bot added 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work 🏷 status: label work required Needs proper labelling before it can be worked on labels Apr 14, 2024
@dhruvkb dhruvkb added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository and removed 🏷 status: label work required Needs proper labelling before it can be worked on 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work 🧱 stack: ingestion server Related to the ingestion/data refresh server 🧱 stack: frontend Related to the Nuxt frontend 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: mgmt Related to repo management and automations labels Apr 14, 2024
Copy link

github-actions bot commented Apr 14, 2024

Full-stack documentation: https://docs.openverse.org/_preview/4107

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

New files ➕:

Changed files 🔄:

@dhruvkb dhruvkb changed the title Switch to Poetry from PIpenv Switch to Poetry from Pipenv Apr 14, 2024
@AetherUnbound
Copy link
Collaborator

I'm testing this out locally and tried the instructions - I made some changes to packages/ov-attribution/ov_attribution/attribution.py and the API did not reload 🤔 In fact, it took rebuilding the web image to get the changes to appear. Is that intended?

@dhruvkb
Copy link
Member Author

dhruvkb commented Apr 16, 2024

It should have reflected immediately because the ov-attribution package is mounted into the API image during development and changes are synced with the container. I suspect it's because the Django API is not reloading when the package changes because it's not watching the packages/ directory. Should be fixed now.

@dhruvkb dhruvkb changed the title Switch to Poetry from Pipenv Switch API package management from Pipenv to PDM Apr 17, 2024
@sarayourfriend
Copy link
Collaborator

Maybe py_packages and js_packages

Works for me!

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so exciting! Really looking forward to a world without pipenv, however we get there.

.github/renovate.json Outdated Show resolved Hide resolved
@AetherUnbound
Copy link
Collaborator

AetherUnbound commented Apr 19, 2024

@dhruvkb should this be drafted while we review #4128? I'm a bit confused at which I should look at first.

@dhruvkb
Copy link
Member Author

dhruvkb commented Apr 19, 2024

@AetherUnbound I undrafted it because I felt quite confident that PDM would be our pick and also so I could also receive feedback on the actual implementation of the migration like that from @sarayourfriend regarding using an src/ layout and naming the dirs py-packages.

Now that #4128 is merged, feel free to review this PR as deeply as you like. Thanks!

Also opened a discussion-type issue #4165 so that we can finalise what to name and where to place our Python packages directory.

@dhruvkb dhruvkb mentioned this pull request Apr 19, 2024
8 tasks
Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I have a few comments/questions, and per the discussion in #4165, I think we should have packages/python/ instead of py-packages/ at the top level (to mirror what we have in automation/).

api/Dockerfile Outdated
Comment on lines 37 to 38
# Copy subpackages from additional build-context 'packages'
COPY --from=packages openverse-attribution ./py-packages/openverse-attribution
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder here if there's a way for us to "wildcard" this 🤔 As in, not have to be explicit about every package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to make this a wildcard, I only didn't do that because we have only one package right now (and that too, is more of a PoC than a real library worth distributing).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best not to wildcard it either, if we can avoid it; it would be nice for it to be "automatic" if new packages are added (I'm sure they will be, eh?) but we can look into ways to derive it from the pyproject.toml rather than a catch-all wild card. There might eventually be dev dependencies in the packages, or packages only used by catalog and ingestion worker (for example) that we would not want copied into the context of the API container (and vice-versa re ingestion workers).

api/pyproject.toml Show resolved Hide resolved
Comment on lines +50 to +55
dev = [
"ipython >=8.22.1, <9",
"pgcli >=3.5.0, <4",
"remote-pdb >=2.1.0, <3",
]
test = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a functional difference between dev and test here, or is the separation only semantic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to selectively build smaller and smaller images from dev (all deps) to CI (main + test) to prod (main only).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! I hadn't seen the test install happening anywhere, so I wasn't sure.

Copy link
Contributor

@obulat obulat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Can't wait to see the PRs that Renovate will open.

I tried changing the attribution package code, and it immediately showed up in the API response.

My request for change is to add the testing of the openverse-attribution to the CI tests.

@dhruvkb
Copy link
Member Author

dhruvkb commented Apr 22, 2024

@obulat to prevent the PR from going even beyond the already changed 35 files, I've made a separate issue to cover CI changes #4167.

Copy link
Contributor

@obulat obulat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the work on selecting the package manager and implementing the changes, @dhruvkb . Makes sense to split the changes into several PRs.
Can we create a milestone for all the pdm-related accompanying issues?

@dhruvkb
Copy link
Member Author

dhruvkb commented Apr 22, 2024

@obulat good call, done.

@sarayourfriend
Copy link
Collaborator

I think we should have packages/python/ instead of py-packages/ at the top level (to mirror what we have in automation/).

I love this, Madison! 💯

@dhruvkb
Copy link
Member Author

dhruvkb commented Apr 22, 2024

@sarayourfriend I'll update #4165 to indicate that general support for this naming scheme.

Also I'm preparing for Python packages to go in packages/python by moving the existing JS package one level deeper in #4174.

Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exciting! The transition looks smoother than I anticipated ✨ I haven't reviewed all the PR, but I noticed some things I want to comment on.

Also, I updated the testing instructions since they mentioned an abbreviated name for the new Python package and were missing an installation step.


This library is a part of the Openverse project. For more information, refer to
the
[documentation](https://docs.openverse.org/packages/ov_attribution/index.html).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[documentation](https://docs.openverse.org/packages/ov_attribution/index.html).
[documentation](https://docs.openverse.org/packages/openverse_attribution/index.html).

Comment on lines +17 to +18
# These commands use standard GNU tools instead of `pdm` because we do not
# require PDM on the host machine for API development.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conflicts with the testing instructions for the openverse-attribution package. Maybe there are alternative instructions using docker containers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement still holds. If one is working on the API (and even if they are making changes to the attribution package), they do not need PDM on the host (because the package will be mounted into the API container directly and be usable via the API).

PDM is only needed on the host if one is purely working outside the context of the API, like the attribution package directly.

@@ -0,0 +1,65 @@
[project]
name = "api"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: more specific would be...

Suggested change
name = "api"
name = "openverse-api"

or

Suggested change
name = "api"
name = "openverse_api"

api/Dockerfile Show resolved Hide resolved
api/Dockerfile Outdated Show resolved Hide resolved
api/Dockerfile Outdated Show resolved Hide resolved
@sarayourfriend
Copy link
Collaborator

sarayourfriend commented Apr 22, 2024

Install PDM. I recommend using the pipx approach, I am too.

FWIW, this isn't the official recommendation from PDM, and I think I've run into weird virtual environment issues sometimes when using the pipx approach that I never had happen when using the project's own recommendation: https://pdm-project.org/en/latest/#recommended-installation-method

If you're not having issues, no worried, but if anyone does run into things, I was able to get everything working without a hitch by avoiding pipx for this. PDM's installer isolates itself and pdm is self-updating, so pipx's specific benefits are elusive (other than being consistent for one's own personal preferences, which I appreciate can matter very much).

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGMT, Dhruv, keeping in mind my opinion that implementation details of the openverse-attribution package are out of scope for this PR and should be pushed to a separate issue dedicated that package.

I'm approving this personally now despite the py-packages directory, because I don't think it's strictly necessary to block this PR on the location of those files, and it'll be trivial to move them after the related JS package PR is merged.

One thing that could be nice is to have a fast-follow to implement a just recipe like the p recipe, that runs pdm for all Python packages, optionally following a pattern. If we don't use pdm workspaces, I suppose we can just use find to identify pyproject.toml files and use PDM's -P flag to point to them and run the command in each one. To that end, we should have consistent script names as well (cf. test:unit, types for the JS packages).

That, to me, is a nice to have in a follow up issue though, and not necessary for this first PR, which is already interesting enough as it is. I agree with many of the things others have brought up as potential avenues to explore in follow up issues, but do not believe any to be blockers for this specific PR, which we'll want to get out and able to build upon for the rest of the -> PDM migration. Feel free to correct me @krysal and @AetherUnbound, just leave a request changes, but in the spirit of expedient decision making, it'd be great to be able to move forward with this without accidentally expanding the scope, especially for anything that don't have lasting harmful implications for the project or its team members (cf. the definition of "blocker" in our decision making process.

Not trying to override anyone else's intuition, just want to encourage making sure we carefully evaluate and respect a defined scope for the PR.

api/Dockerfile Outdated
Comment on lines 37 to 38
# Copy subpackages from additional build-context 'packages'
COPY --from=packages openverse-attribution ./py-packages/openverse-attribution
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best not to wildcard it either, if we can avoid it; it would be nice for it to be "automatic" if new packages are added (I'm sure they will be, eh?) but we can look into ways to derive it from the pyproject.toml rather than a catch-all wild card. There might eventually be dev dependencies in the packages, or packages only used by catalog and ingestion worker (for example) that we would not want copied into the context of the API container (and vice-versa re ingestion workers).

api/Dockerfile Outdated
@@ -96,6 +98,7 @@ RUN useradd --create-home opener \
USER opener

# Copy code into the final image
COPY --chown=opener --from=packages openverse-attribution /py-packages/openverse-attribution/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary, I thought, because it should be copied into the .venv for production (non-editable). Unless this is explicitly to support the development-mode of this target? I wish it were possible to conditionally copy, if that is the reason.

Bit of a nit, though, we can iterate on this without issue.

Edit (after reading the compose file again): Even in development mode, it shouldn't be necessary, because we map the python packages directory in the compose file, so the editable path reference should still be valid.

@@ -8,12 +8,12 @@
from django.utils.html import format_html

from elasticsearch import Elasticsearch, NotFoundError
from openverse_attribution.attribution import get_attribution_text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've gone back and forth on the name, but it ocurred to me this doesn't even need openverse specified in the package name. Nothing about it is actually Openverse specific. It is, however, CC-specific! cc_attribution would be a good name for it.

Just a nit though, don't bother spending time on this, we can bikeshed the name if we ever get around to publishing it.

api/pyproject.toml Show resolved Hide resolved
@@ -0,0 +1,79 @@
# `openverse-attribution`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, occurring to me that it'd be important for package README's to be in the package itself, maybe just copied to documentation during the build step, otherwise it isn't available for inclusion in the build output of the package. npm and pypi require a README (twine check will fail on build outputs that don't have it, and you won't be allowed to upload it).

Not an issue for this PR, but something we will need to address in the future when we publish packages from the packages directories.

api/Dockerfile Show resolved Hide resolved
@dhruvkb
Copy link
Member Author

dhruvkb commented Apr 23, 2024

Since these changes are working and the PR's core ideas have 2 approvals, I will merge this PR. Not dismissing any of the comments from the reviews, I'll make separate issues for each of them and add them to the milestone.

@dhruvkb dhruvkb merged commit 7dac3ad into main Apr 23, 2024
42 checks passed
@dhruvkb dhruvkb deleted the new_pkg_man branch April 23, 2024 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API 🧱 stack: documentation Related to Sphinx documentation
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants