-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerfy the Openverse development environment #4343
Conversation
I've taken a try at basic documentation for this in the docs site... but part of me feels like this would warrant a complete overhaul of the structure of that documentation. So much of it could basically be deleted if this dockerfied development environment really works. I'd be keen to hear what @dhruvkb and @AetherUnbound think about that, and @krysal who's thought a lot about documentation as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started testing this PR locally, and used my Windows machine's WSL to both have a fresh install and to see what it's like. I'm going to continue trying, but I just wanted to leave a note on how long it took to run ./ov
: 13370 seconds (2 hours 40 minutes). My internet download speed was measured at 77 Mbps, so it's not the download that took so long. I wonder if WSL or specifically my WSL setup are causing this to take so long.
Let's ignore the bootstrapping part for now and consider it an optional additional feature to this. Instead, clone the repository, and run the ov script directly as the MVP. I'll change the instructions and implementation later to match this. It'll just amount to removing a bunch of code. |
e9378ee
to
8a08f43
Compare
Full-stack documentation: https://docs.openverse.org/_preview/4343 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. Changed files 🔄: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic! I just have a few small requests/requests for clarification, but this already works great on my machine. I haven't tried to hook PyCharm up to use it as a development environment, though I was able to start the stack and run some DAGs 🚀
I am getting the following whenever I run ov
, it'd probably be best to suppress these:
$ ov just down
INFO: PDM 2.15.1 is installed, while 2.15.3 is available.
Please run `pipx upgrade pdm` to upgrade.
Run `pdm config check_update false` to disable the check.
just dc down
...
@sarayourfriend I've added two commits to tackle each of the problems I mentioned in the comments. Please feel free to drop either one of them if the change I made breaks some other conventions or expectations! |
Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR: @dhruvkb Excluding weekend1 days, this PR was ready for review 7 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2. @sarayourfriend, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has the automations/python/Pipfile.lock
file changed in the PR? Was it accidental?
I got hit by this when trying to start.
docker: no matching manifest for linux/arm64/v8 in the manifest list entries.
Maybe this approach will not work on Apple Silicon M-series Macs?
docker/dev_env/Dockerfile
Outdated
# - docker*: used to interact with host docker socket | ||
# - nodejs, npm: language runtime and dependency manager (system npm required by node-based pre-commit hooks) | ||
# - just: command runner | ||
# - n: nodejs distribution manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n
is something I have not used before. I am always used nvm
for managing Node versions. I don't have any specific reasons to favour one over the other, but just noting it down as something that differs from my personal toolchain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why, but nvm
(and relatedly pyenv
) have always driven me to tears of frustration with how they work... and n
just does the thing I expect it to do every time. nvm
for some reason always wants to compile Node from scratch on my computer! pyenv
does the same thing for some reason? I think I tried asdf
and had a similar thing happen with Python, Node, or something else. nvm
's magic of wanting to auto-detect and change Node versions for you was endlessly frustrating... I guess I have had a bad experience with it 😅 Some of that might come from ignorance or may have been fixed in the many years since I've bothered with it.
I don't know, happy to use whatever here really, but in my recent experience n
always downloads the binary distribution and "just works" (including n install auto
, which I never figured out with nvm). If nvm
works better than n
for some reason, happy to switch to it. With this ov
/docker based approach, it theoretically shouldn't matter, as long as someone gets it working right it should "just work" for everyone.
Fedora actually has the various Node versions available in its repositories, so I guess we could just install it directly, but I don't know how easy it is to resolve which one to use in what context.
I managed to build the image using the docker build --platform=linux/amd64 -t openverse-dev-env:latest docker/dev_env ...but it fails in
If I remove the group from - host_docker_gid="$(getent group docker | cut -d: -f3)"
...
- --user "$UID:$host_docker_gid"
+ --user "$UID" ...the container manages to run but then fails because of permission issues.
|
Can you figure out what the process is for macOS to find the group? It's absolutely necessary for permissions to pass the docker group. I don't have any way to test this on macOS so I'll need help from someone with a mac to debug these issues if this is going to work. |
The upstream arch project doesn't support ARM, and the Arch for ARM project has separate repositories, so it won't "swap out" easily for ARM based laptops. Drat! I'll try using Fedora... |
e401e1e
to
bad1167
Compare
I've got changes working to use Fedora locally. I'll push those in a moment, and then follow up with adding code from https://stackoverflow.com/a/10910180 for getting the docker group in macOS. @dhruvkb I'll need to you test that for me though, and potentially tinker with it if it doesn't work right away. |
For sure, I'm excited to review it as soon as you push changes. |
326a243
to
654bbbf
Compare
docker/dev_env/shared_aliases.sh
Outdated
|
||
# Personal aliases and utilities should go in an .ovprofile file at the monorepo root | ||
|
||
declare -A shared_aliases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declare -A
is a feature that was introduced in Bash newer than the one on macOS (version 3.2.57(1)-release). This doesn't work for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding a note here, we tried switching to zsh, and that worked great, except that shellcheck doesn't support it.
We only use the associative bash arrays for aliases, and I think we could easily support it another way (e.g., using Python to expand aliases). I've removed the alias support for now @AetherUnbound and we can try to accommodate that in a separate PR.
Dhruv just pointed out to me that I haven't actually addressed #4256 yet in this PR. I'll unlink that from this PR and in a separate PR we can switch to running Prettier using the system pre-commit hook instead of mirrors-prettier |
So far the follow-up issues I'll open for this are:
Anything else @dhruvkb @AetherUnbound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried a bunch of things in the container, things are working perfectly!
@AetherUnbound I've merged this based on your previous approval of the overall approach, and based on it working for Dhruv and myself as example linux and macOS hosts. Both of us expect there to be further improvements or changes that need to be made to make the solution truly universal, but I wanted to get this particular PR merged because it's already gotten pretty complex! This doesn't change anything about the way the repository works and the documentation isn't there yet to suggest community contributors to use it, so I think it's safe to merge this as an "alpha" version, the "MVP" of this idea, and to continue trying it out individually and making improvements as we go. Aliases aren't working for now, but I'll open an issue to try adding them back in as a special feature! |
That sounds great, thanks for going ahead and merging it! I'm super excited about this! |
Fixes
Fixes #2068
Fixes #4137
Fixes #4404
Description
Dockerfy the Openverse development environment. This PR makes
bash
,git
, anddocker
the only dependencies (on the host) required to work on Openverse (though there are probably some gaps in the tools installed in the container that need to be fixed for all workflows). bash and git are installed by default on almost all systems that aren't geared exclusively towards experts (who won't have trouble installing bash or git).The PR accomplishes this by passing the docker socket through to an alpine-based container with all the development dependencies installed (or ready to be installed). That's the main thing that makes this work, because then docker-compose et al are able to run inside the container.
I originally tried this with alpine, and got very close, but the assumptions that exist in alpine are tedious and difficult to work with in a development environment. Alpine does things rather differently than most other Linux distributions to keep things small. This, of course, comes with a trade-off of not matching the typical experience, neither for the most common Linux distributions nor for macOS. For example, coreutils is not included in alpine linux, so many utilities (including e.g., timeout which we use) are busybox implementations or at least not the GNU implementation, and therefore suffer from slight differences. coreutils can be installed, but other things like user management and permissions are still quite different.
In the end, it was much easier to get things working with an archlinux base. That produces an image that is well over a GB in size, however, but mostly because of the
pacman -Syyu
system update to get the latest versions of all packages in the base installation. You can see the real base image is only ~150 MB compressed (https://hub.docker.com/layers/library/archlinux/latest/images/sha256-85dc960fa1b01560091e6de62b09c4ad99c35cf818f6a7e2b2118a57f712bcb7?context=explore), so users of the image would only download that base image every so often, and just need to download the updated dependencies when runningdocker build
(viaov build
).However, Arch's support for ARM comes from an entirely separate off-shoot project, rather than upstream Arch, and as such the archlinux docker image does not work on m-model macs. I switched to Fedora after Dhruv had issues, and Fedora does support ARM in it's upstream, so hopefully it will work! The only thing not available in Fedora's repositories are the docker dependencies, but it's very easy to install the repositories for Docker with dnf and so that was trivial to solve. Everything else (including
just
) is there and ready to go!If we go this route, then Openverse development setup would go like this:
./ov init
Then run any development related commands using the
ov
script. Make it easier to use by adding it to somewhere on your path usingln -s /path/to/ov <somewhere>
. If~/.local/bin
is on your path, for example,ln -s /path/to/ov ~/.local/bin/ov
works well. Otherwise, just make sure you're in the Openverse directory when running./ov
.ov just
will list all available just recipesov just api/up
will start the APIov just p frontend test:unit
will run the frontend unit testsov just p frontend dev
will run the frontend appAnd so on and so forth.
ov bash
opens up a shell in the container for you to poke around (and especially to debug things that aren't working in this setup ^_^)Testing Instructions
Try out the steps I've shared above. See what works and what doesn't. If there are dependencies missing, let's add them! If this is a good way to go, I'll update the documentation. I plan to use
ov
myself if this moves forward, which means it will be much more maintainable than a script none of us will probably ever run on a regular basis.Also try adding new aliases in
.ovprofile
and see that they work. The example included by default is meant to illustrate how associative arrays should be used in bash.We can use this in CI as well!
Checklist
Update index.md
).main
) or a parent feature branch.just catalog/generate-docs
for catalogPRs) or the media properties generator (
just catalog/generate-docs media-props
for the catalog or
just api/generate-docs
for the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin