Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfy the Openverse development environment #4343

Merged
merged 20 commits into from
May 30, 2024

Conversation

sarayourfriend
Copy link
Collaborator

@sarayourfriend sarayourfriend commented May 16, 2024

Fixes

Fixes #2068
Fixes #4137
Fixes #4404

Description

Dockerfy the Openverse development environment. This PR makes bash, git, and docker the only dependencies (on the host) required to work on Openverse (though there are probably some gaps in the tools installed in the container that need to be fixed for all workflows). bash and git are installed by default on almost all systems that aren't geared exclusively towards experts (who won't have trouble installing bash or git).

The PR accomplishes this by passing the docker socket through to an alpine-based container with all the development dependencies installed (or ready to be installed). That's the main thing that makes this work, because then docker-compose et al are able to run inside the container.

I originally tried this with alpine, and got very close, but the assumptions that exist in alpine are tedious and difficult to work with in a development environment. Alpine does things rather differently than most other Linux distributions to keep things small. This, of course, comes with a trade-off of not matching the typical experience, neither for the most common Linux distributions nor for macOS. For example, coreutils is not included in alpine linux, so many utilities (including e.g., timeout which we use) are busybox implementations or at least not the GNU implementation, and therefore suffer from slight differences. coreutils can be installed, but other things like user management and permissions are still quite different.

In the end, it was much easier to get things working with an archlinux base. That produces an image that is well over a GB in size, however, but mostly because of the pacman -Syyu system update to get the latest versions of all packages in the base installation. You can see the real base image is only ~150 MB compressed (https://hub.docker.com/layers/library/archlinux/latest/images/sha256-85dc960fa1b01560091e6de62b09c4ad99c35cf818f6a7e2b2118a57f712bcb7?context=explore), so users of the image would only download that base image every so often, and just need to download the updated dependencies when running docker build (via ov build).

However, Arch's support for ARM comes from an entirely separate off-shoot project, rather than upstream Arch, and as such the archlinux docker image does not work on m-model macs. I switched to Fedora after Dhruv had issues, and Fedora does support ARM in it's upstream, so hopefully it will work! The only thing not available in Fedora's repositories are the docker dependencies, but it's very easy to install the repositories for Docker with dnf and so that was trivial to solve. Everything else (including just) is there and ready to go!

If we go this route, then Openverse development setup would go like this:

  1. Make sure you have bash, git and docker installed
  2. Clone the repository and cd into it
  3. Run ./ov init

Then run any development related commands using the ov script. Make it easier to use by adding it to somewhere on your path using ln -s /path/to/ov <somewhere>. If ~/.local/bin is on your path, for example, ln -s /path/to/ov ~/.local/bin/ov works well. Otherwise, just make sure you're in the Openverse directory when running ./ov.

ov just will list all available just recipes

ov just api/up will start the API

ov just p frontend test:unit will run the frontend unit tests

ov just p frontend dev will run the frontend app

And so on and so forth.

ov bash opens up a shell in the container for you to poke around (and especially to debug things that aren't working in this setup ^_^)

Testing Instructions

Try out the steps I've shared above. See what works and what doesn't. If there are dependencies missing, let's add them! If this is a good way to go, I'll update the documentation. I plan to use ov myself if this moves forward, which means it will be much more maintainable than a script none of us will probably ever run on a regular basis.

Also try adding new aliases in .ovprofile and see that they work. The example included by default is meant to illustrate how associative arrays should be used in bash.

We can use this in CI as well!

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • [N/A] I ran the DAG documentation generator (just catalog/generate-docs for catalog
    PRs) or the media properties generator (just catalog/generate-docs media-props
    for the catalog or just api/generate-docs for the API) where applicable.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@sarayourfriend sarayourfriend added 🟨 priority: medium Not blocking but should be addressed soon 🌟 goal: addition Addition of new feature 🤖 aspect: dx Concerns developers' experience with the codebase 🧱 stack: mgmt Related to repo management and automations labels May 16, 2024
@sarayourfriend sarayourfriend requested a review from a team as a code owner May 16, 2024 04:56
@sarayourfriend
Copy link
Collaborator Author

I've taken a try at basic documentation for this in the docs site... but part of me feels like this would warrant a complete overhaul of the structure of that documentation. So much of it could basically be deleted if this dockerfied development environment really works.

I'd be keen to hear what @dhruvkb and @AetherUnbound think about that, and @krysal who's thought a lot about documentation as well.

Copy link
Contributor

@obulat obulat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started testing this PR locally, and used my Windows machine's WSL to both have a fresh install and to see what it's like. I'm going to continue trying, but I just wanted to leave a note on how long it took to run ./ov: 13370 seconds (2 hours 40 minutes). My internet download speed was measured at 77 Mbps, so it's not the download that took so long. I wonder if WSL or specifically my WSL setup are causing this to take so long.

ov Outdated Show resolved Hide resolved
@sarayourfriend
Copy link
Collaborator Author

sarayourfriend commented May 21, 2024

Let's ignore the bootstrapping part for now and consider it an optional additional feature to this. Instead, clone the repository, and run the ov script directly as the MVP. I'll change the instructions and implementation later to match this. It'll just amount to removing a bunch of code.

@sarayourfriend sarayourfriend force-pushed the add/openverse-dev-env-docker-image branch from e9378ee to 8a08f43 Compare May 21, 2024 07:54
Copy link

Full-stack documentation: https://docs.openverse.org/_preview/4343

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

Changed files 🔄:

Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! I just have a few small requests/requests for clarification, but this already works great on my machine. I haven't tried to hook PyCharm up to use it as a development environment, though I was able to start the stack and run some DAGs 🚀

I am getting the following whenever I run ov, it'd probably be best to suppress these:

$ ov just down
INFO: PDM 2.15.1 is installed, while 2.15.3 is available.
Please run `pipx upgrade pdm` to upgrade.
Run `pdm config check_update false` to disable the check.
just dc down 
...

ov Outdated Show resolved Hide resolved
docker/dev_env/run.sh Show resolved Hide resolved
docker/dev_env/Dockerfile Show resolved Hide resolved
docker/dev_env/run.sh Outdated Show resolved Hide resolved
@AetherUnbound
Copy link
Collaborator

@sarayourfriend I've added two commits to tackle each of the problems I mentioned in the comments. Please feel free to drop either one of them if the change I made breaks some other conventions or expectations!

@openverse-bot
Copy link
Collaborator

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@dhruvkb
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 7 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2.

@sarayourfriend, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has the automations/python/Pipfile.lock file changed in the PR? Was it accidental?

I got hit by this when trying to start.

docker: no matching manifest for linux/arm64/v8 in the manifest list entries.

Maybe this approach will not work on Apple Silicon M-series Macs?

justfile Show resolved Hide resolved
# - docker*: used to interact with host docker socket
# - nodejs, npm: language runtime and dependency manager (system npm required by node-based pre-commit hooks)
# - just: command runner
# - n: nodejs distribution manager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n is something I have not used before. I am always used nvm for managing Node versions. I don't have any specific reasons to favour one over the other, but just noting it down as something that differs from my personal toolchain.

Copy link
Collaborator Author

@sarayourfriend sarayourfriend May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why, but nvm (and relatedly pyenv) have always driven me to tears of frustration with how they work... and n just does the thing I expect it to do every time. nvm for some reason always wants to compile Node from scratch on my computer! pyenv does the same thing for some reason? I think I tried asdf and had a similar thing happen with Python, Node, or something else. nvm's magic of wanting to auto-detect and change Node versions for you was endlessly frustrating... I guess I have had a bad experience with it 😅 Some of that might come from ignorance or may have been fixed in the many years since I've bothered with it.

I don't know, happy to use whatever here really, but in my recent experience n always downloads the binary distribution and "just works" (including n install auto, which I never figured out with nvm). If nvm works better than n for some reason, happy to switch to it. With this ov/docker based approach, it theoretically shouldn't matter, as long as someone gets it working right it should "just work" for everyone.

Fedora actually has the various Node versions available in its repositories, so I guess we could just install it directly, but I don't know how easy it is to resolve which one to use in what context.

documentation/general/quickstart.md Outdated Show resolved Hide resolved
@dhruvkb
Copy link
Member

dhruvkb commented May 28, 2024

I managed to build the image using the docker command directly instead of ov build...

docker build --platform=linux/amd64 -t openverse-dev-env:latest docker/dev_env

...but it fails in run.sh because getent is different from the Linux implementation.

❯ ./ov just install-hooks
/Users/dhruvkb/Developer/ov/openverse/docker/dev_env/run.sh: line 5: getent: command not found

If I remove the group from run.sh...

-  host_docker_gid="$(getent group docker | cut -d: -f3)"
   ...
-  --user "$UID:$host_docker_gid"
+  --user "$UID"

...the container manages to run but then fails because of permission issues.

❯ ./ov just install-hooks                                                       
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Adding pnpm@9.1.0 to the cache...
Internal Error: EACCES: permission denied, mkdir '/.cache/node/corepack/v1'
    at mkdirSync (node:fs:1374:26)

ov Show resolved Hide resolved
docker/dev_env/entrypoint.sh Outdated Show resolved Hide resolved
@sarayourfriend
Copy link
Collaborator Author

...but it fails in run.sh because getent is different from the Linux implementation.

Can you figure out what the process is for macOS to find the group? It's absolutely necessary for permissions to pass the docker group. I don't have any way to test this on macOS so I'll need help from someone with a mac to debug these issues if this is going to work.

@sarayourfriend
Copy link
Collaborator Author

The upstream arch project doesn't support ARM, and the Arch for ARM project has separate repositories, so it won't "swap out" easily for ARM based laptops. Drat! I'll try using Fedora...

@sarayourfriend sarayourfriend force-pushed the add/openverse-dev-env-docker-image branch from e401e1e to bad1167 Compare May 29, 2024 01:44
@sarayourfriend
Copy link
Collaborator Author

I've got changes working to use Fedora locally. I'll push those in a moment, and then follow up with adding code from https://stackoverflow.com/a/10910180 for getting the docker group in macOS. @dhruvkb I'll need to you test that for me though, and potentially tinker with it if it doesn't work right away.

@dhruvkb
Copy link
Member

dhruvkb commented May 29, 2024

For sure, I'm excited to review it as soon as you push changes.

@sarayourfriend sarayourfriend force-pushed the add/openverse-dev-env-docker-image branch from 326a243 to 654bbbf Compare May 30, 2024 03:12

# Personal aliases and utilities should go in an .ovprofile file at the monorepo root

declare -A shared_aliases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declare -A is a feature that was introduced in Bash newer than the one on macOS (version 3.2.57(1)-release). This doesn't work for me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding a note here, we tried switching to zsh, and that worked great, except that shellcheck doesn't support it.

We only use the associative bash arrays for aliases, and I think we could easily support it another way (e.g., using Python to expand aliases). I've removed the alias support for now @AetherUnbound and we can try to accommodate that in a separate PR.

@sarayourfriend
Copy link
Collaborator Author

Dhruv just pointed out to me that I haven't actually addressed #4256 yet in this PR. I'll unlink that from this PR and in a separate PR we can switch to running Prettier using the system pre-commit hook instead of mirrors-prettier

@sarayourfriend
Copy link
Collaborator Author

sarayourfriend commented May 30, 2024

So far the follow-up issues I'll open for this are:

  • Explore using ov in CI
  • Add support for alias expansion to dev_env/entrypoint.sh (without using bash associative arrays)
  • Replace perl with Python in fix-console-code-block pre-commit hook (perl pulls in a huge set of CPAM dependencies during the image build and its used in literally a single place for linting, replacing it would make the container setup much faster).

Anything else @dhruvkb @AetherUnbound

@openverse-bot openverse-bot added the 💻 aspect: code Concerns the software code in the repository label May 30, 2024
docker/dev_env/Dockerfile Outdated Show resolved Hide resolved
docker/dev_env/Dockerfile Show resolved Hide resolved
justfile Show resolved Hide resolved
ov Outdated Show resolved Hide resolved
docker/dev_env/run.sh Outdated Show resolved Hide resolved
sarayourfriend and others added 2 commits May 30, 2024 15:19
Co-authored-by: Dhruv Bhanushali <hi@dhruvkb.dev>
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried a bunch of things in the container, things are working perfectly!

@sarayourfriend sarayourfriend merged commit bd7ad68 into main May 30, 2024
43 checks passed
@sarayourfriend sarayourfriend deleted the add/openverse-dev-env-docker-image branch May 30, 2024 05:29
@sarayourfriend
Copy link
Collaborator Author

@AetherUnbound I've merged this based on your previous approval of the overall approach, and based on it working for Dhruv and myself as example linux and macOS hosts.

Both of us expect there to be further improvements or changes that need to be made to make the solution truly universal, but I wanted to get this particular PR merged because it's already gotten pretty complex!

This doesn't change anything about the way the repository works and the documentation isn't there yet to suggest community contributors to use it, so I think it's safe to merge this as an "alpha" version, the "MVP" of this idea, and to continue trying it out individually and making improvements as we go.

Aliases aren't working for now, but I'll open an issue to try adding them back in as a special feature!

@AetherUnbound
Copy link
Collaborator

That sounds great, thanks for going ahead and merging it! I'm super excited about this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🤖 aspect: dx Concerns developers' experience with the codebase 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: mgmt Related to repo management and automations
Projects
Archived in project
5 participants