RFC: Monorepo #336

dhruvkb · 2022-11-24T08:37:52Z

Fixes

Fixes #205 by @zackkrida

Description

This PR writes the monorepo RFC.

Proof of Concept can be seen at ~~https://github.com/dhruvkb/monopenverse~~ https://github.com/dhruvkb/monoverse.

Testing Instructions

Read the RFC.
Give your Cs.

Checklist

My pull request has a descriptive title (not a vague title like Update index.md).
My pull request targets the default branch of the repository (main) or a parent feature branch.
My commit messages follow best practices.
My code follows the established code style of the repository.
I added or updated tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

rfcs/20221124-monorepo.md

dhruvkb · 2022-11-30T10:59:03Z

After extensive testing in the PoC, everything is working. In the PoC, no existing functionality is broken and a lot of things are simplified, merged or otherwise improved.

In my testing, the monorepo migration was a net positive so I suggest we move forward with it.

sarayourfriend · 2022-11-30T23:38:38Z

rfcs/20221124-monorepo.md

+
+1. Central place for all technical documentation, enabling documentation for different parts of the stack to cross-reference other pieces and stay current with changes in other places.
+
+1. Enables the infra to deploy the code to coexist with the code itself. Apart from the private secrets that will still need to be encrypted, the IaC files could be organised identical to the code.


the IaC files could be organised identical to the code.

Just a nit but this isn't entirely true. We could maybe spread some Terraform modules out this way but it would be a pain to have so many places to search for modules. If the Terraform code will be merged into the monorepo as well then it'd be better off assuming that it will live, in totality, in a separate directory.

Ah, my lack of knowledge in the IaC space is really showing here. I would appreciate if you could add any infra issues (especially any dealbreakers) and any infra benefits that you see popping up due to this merge. Thanks!

From an infra perspective I don't think there are any benefits really. The only thing I could see is if we did this issue: https://github.com/WordPress/openverse-infrastructure/issues/220. The benefit of that issue could be a more easily reusable workflow with better variables in the deployment workflows we have now. Basically everything that is used to define the deployment workflows are variables that we could theoretically move into GitHub, thought it's just more manual management of those details. We could try to sync them other ways like with a GitHub secret of a JSON blob that gets passed through so the workflow call is something like:

- uses: ./actions/deploy.yml with: config: ${{ secrets.NUXT_DEPLOYMENT_CONFIG }}

And then the workflow parses the JSON to get the configuration.

Note: For folks without access to the infrastructure repository, the issue above describes switching to a callable workflow instead of a composite action. I've found composite actions a little more tedious so I avoided trying to make the current deployment workflows super modular. I think a callable workflow would be easier to work with for a lot of reasons and could open up the possibility of making a more module/generic single-deployment workflow file a little more realistic.

We could do all of this in our current setup, but a monorepo would end up with less places to sync the secrets to and less places to put the deployment action.

Actually, that reminds me, if the infrastructure is in the same repository, the deployment workflow outputs could be automatically merged whenever changes are made to them. Right now we have to manually sync them from the infrastructure repository to wherever they're needed. So that would be a nice simplification of that process as well (one less thing to remember to do!) 🙂

sarayourfriend · 2022-11-30T23:40:10Z

rfcs/20221124-monorepo.md

+
+   - In fact, we employ a number of hacks to install and configure pre-commit for the frontend. Merging it with the API eliminates the need for such hacks.
+
+1. The entire system can be integration tested during releases. The real API, populated with test data, can even replace the Talkback server.


Interesting idea! For this to be feasible, we would need to come up with a definitive solution to preventing the API from making network requests when it executes a search. Right now Playwright tests do not depend at all on any external network and that needs to remain the case.

I think it can be done (maybe something like an environment variable flag to not make requests) and I can envision it being glorious when it works.

sarayourfriend · 2022-11-30T23:42:51Z

rfcs/20221124-monorepo.md

+
+1. The entire system can be integration tested during releases. The real API, populated with test data, can even replace the Talkback server.
+
+The `WordPress/openverse-api` repo will absorb the `WordPress/openverse-frontend` repo. The `WordPress/openverse-catalog` will also be merged, _later_.


Will there be a period of time during which frontend work is happening only in openverse-api? Is the long-term plan to merge everything into openverse-api and then eventually rename that to just openverse?

Yes. The timeline will be

pause work on the frontend

merge WordPress/openverse-frontend into WordPress/openverse-api

transfer issues and other relevant GitHub models

get fronted functional, enable deploys again

resume frontend work in WordPress/opnenverse-api

archive WordPress/opnenverse-frontend

The one week to one fortnight period mentioned in the RFC is the time between step 1 and step 5.

sarayourfriend · 2022-11-30T23:43:39Z

rfcs/20221124-monorepo.md

+
+#### Get the timing right
+
+The first step will be to release the frontend, call a code freeze and pause work on it. This is to prevent the frontend repo from continuing to drift as we merge a snapshot of it with the API.


Should we schedule a bug bash before this so that we can identify and resolve any outstanding issues before we enter the period of time where we won't be able to without great disruption?

Yes, that would be ideal (maybe even necessary). Before embarking on this potential two-week freeze, it would be best to have the frontend be in a state that could serve two weeks without maintenance.

If we have no high or critical priority tickets, that would indicate optimal time to work on this move. Getting the timing right is one of the main things here.

If we are planning to do a bug discovery session during the Istanbul meetup in January, maybe targeting this for late February (or later/earlier depending on what bugs we find) could be good, provided everyone is on board. It'd be nice to know that ahead of time as well so that we could block off the time and expect frontend feature work to undergo a stoppage at a specific period of time rather than just "sometime in the future". Did you have any particular alternative scheduling ideas in mind?

The bug bash we had in Venice was very productive so repeating it seems logical. But there are two potential issues with that:

We only identified and filed issues in the last meetup, fixing them afterwards.

I can't speak for everyone but my productivity immediately after the meetup was very low.

IMO the best time for this migration would be when all the frontend devs' productivity has gone down (less activity = easier migration) so if we could iron out the bugs before the meetup, the meetup would be the best time to migrate this because we'd all be together to collaborate in the migration process.

We do need to set down a date for this but we should only do this after the iframe removal project is deployed, and has been stable for a week or two. Other than that, we can make any time frame work.

Maybe we should try to do a bug bash before the meetup then. We could even start the monorepo migration during the meetup given the core contributors will not likely be spending much time making direct frontend contributions. That way we could easily pair on any unexpected issues that crop up.

Though that would be hard if not impossible to do before the iframe project, given the current blockers and ambiguous time frame for resolving them 😢

The period of the code freeze is a big concern for me.

Having a bug bash before it is a really great idea. It's really difficult to plan it because we probably need a sync meeting to identify bugs, [very rough estimates] a day or 2 to document all of them, and then 1-3 weeks to fix everything, especially considering the slow pace of the PR review process. We would probably only want to fix the high and critical priority bugs.

I wonder how this process be affected by the features that we plan to work on now: navigation menu improvements and the homepage. Also, if we plan to work on the addition of analytics next, would that need to be finished before the monorepo move? Or fully postponed till after the move?

Starting the monorepo migration during meetup is great for the reasons Sara mentions. However, I'm not sure that the process itself is a great use of our time at the meetup. Someone will have to be working on the migration steps during the meetup and in the week after that.

sarayourfriend · 2022-11-30T23:46:06Z

rfcs/20221124-monorepo.md

+
+This is a quick process.
+
+1. Move the entire content of frontend inside a `frontend/` directory, except the following top-level files and folders. Please comment if you can add to this list.


What is the plan for making pnpm work in the monorepo? Do we want to go ahead and making pnpm live at the root of the repository and treat the frontend as a pnpm monorepo package? https://pnpm.io/workspaces

I think this would be necessary for us to be able to (reasonably) merge the browser extension and the JS that is currently in WordPress/openverse into the monorepo... otherwise we'll be fighting weird fights with pnpm in different directories 🤔

That was an issue. Currently I had everything working with pnpm with package.json moved into the frontend/ directly (no package manager at the root), but I definitely should look into the workspaces feature.

sarayourfriend · 2022-11-30T23:46:27Z

rfcs/20221124-monorepo.md

+   - `.prettierignore` (symlink into the `frontend/` directory)
+   - `.eslintrc.js` (symlink into the `frontend/` directory)
+   - `.eslintignore` (symlink into the `frontend/` directory)


Why do these need to be symlinked?

That's an error on my part. I was performing lots of experiments in https://github.com/dhruvkb/monopenverse and with linting to work so I tried almost everything. When I got it to work finally, I forgot to update the RFC with the right approach.

Fixed in f2122d8.

rfcs/20221124-monorepo.md

sarayourfriend · 2022-11-30T23:49:55Z

rfcs/20221124-monorepo.md

+I will need more information about this because IANAL.
+
+- LICENSE (both repos)


Given Openverse still uses MIT we have probably very little to worry about here, FWIW.

Nice, but I had a vague feeling that merging repositories could have legal implications for those that use them somehow.

sarayourfriend · 2022-11-30T23:50:25Z

rfcs/20221124-monorepo.md

+- CONTRIBUTING.md (both repos)
+- CONTRIBUTORS.md (API only; also why?)
+- DOCUMENTATION_GUIDELINES.md (API only)
+- TESTING_GUIDELINES.md (frontend only)


We would probably want to move this into the frontend directory, I would think. It is extremely frontend specific. Or would it go into the Sphinx docs?

My long-term goal is for Sphinx docs to be the single source of reference for all developers, not just API devs.

Some Markdown files that people generally look for in the repo root could be left there but ideally, even they would just point to the relevant Sphinx pages for the real stuff.

I love that plan!

Co-authored-by: sarayourfriend <24264157+sarayourfriend@users.noreply.github.com>

stacimc · 2022-12-01T18:53:24Z

rfcs/20221124-monorepo.md

+
+## Migration path
+
+First we will merge the API and the frontend. This decision was made for the following reasons.


This reasoning makes a lot of sense to me, and I'm sold on merging API/frontend first and separately from the Catalog. How much more difficult do you anticipate eventually merging the Catalog to be? Do you think there's value in planning that work first, or can that be tabled until after the API/frontend are merged?

I am fairly confident that merging the catalog would not cause major pain later on, mostly because it's another Python project. The API and ingestion server are both independent Python projects that have coexisted in harmony since the beginning so adding a third does not seem problematic.

We could definitely plan for the eventual catalog-API merge, but I abstained from that in the RFC because

I am not the most experienced with the catalog so I felt a bit underqualified to write about that in the RFC

I didn't want to delay the RFC till I could understand the nuances of the catalog enough to include the portion about merging it

rfcs/20221124-monorepo.md

zackkrida · 2022-12-01T22:48:03Z

One note regarding translations I'd like to make sure is mentioned in this RFC:

We will have to contact a member of the WP Meta team (who works on wp.org) to change the path at which GlotPress reads our translation files, to both update to the new repository and filepath. I believe the current path used is https://github.com/WordPress/openverse-frontend/blob/translations/openverse.pot

Co-authored-by: Zack Krida <zackkrida@pm.me>

rfcs/20221124-monorepo.md

zackkrida · 2022-12-02T18:20:32Z

rfcs/20221124-monorepo.md

+   - `.eslintignore`
+   - <s>`.gitignore`</s> (better to move it into the `frontend/` directory and update some absolute paths)
+
+1. Create the final commit on the `WordPress/openverse-frontend` repo. After the merge we might want to add a notice about the migration to the `README.md` file but GitHub's built-in archival process could suffice here.


I think we definitely want to add a README notice and use the GitHub archiving functionality together. The README information will crucially link to the new repository. We should also update the repo description with the new url as well, here:

zackkrida · 2022-12-02T18:23:07Z

rfcs/20221124-monorepo.md

+
+The first step will be to release the frontend, call a code freeze and pause work on it. This is to prevent the frontend repo from continuing to drift as we merge a snapshot of it with the API.
+
+This can prove difficult given how productive our team is, so we will need to channel this productivity towards the catalog in the meantime. I can foresee the end-to-end migration taking one week (ideal scenario) to one fortnight (worst case scenario).


I can foresee the end-to-end migration taking one week (ideal scenario) to one fortnight (worst case scenario).

@dhruvkb Could you unpack this time estimate a bit more for me? I am curious because I could see these tasks taking 1-2hrs or so and being treated like a deploy, where a few folks work on it together synchronously. I also think that would have benefit of minimizing the disruption to contributors. Are there long-running tasks I'm not considering?

I would also like to see some concrete implementation details about, for example, how we plan to make a PR for some of the meta/internal changes to the repo (the unified CODEOWNERS file, a new README, the modified actions, etc.). Could we have those changes prepared in a draft PR that we only merge after the frontend main branch has been combined with the API? And regarding that frontend step, do we plan to do it via PR in the GitHub UI? Details like that will give me much more confidence in our final approach.

The missing implementation details (such as writing new documentation, updating references) are what will take the maximum time in the 1-2 week time range I proposed for this plan.

The technical merge is a and getting the functionality working is a day or two. The remaining 3 days are for identifying these issues and fixing them. I kept 5 more days as a backup in case something unforeseen comes up but given we do sufficient planning beforehand, we will not need the second week (underpromise, overdeliver 😉).

It might be hard to make a draft PR with the changes (because of the merge conflicts) but one thing we could do is start a new branch, do the merge on it and eventually replace main with it. That seems like a great way to see everything in action but a lot of (branch-name dependent actions will be affected when we promote the other branch).

In any case, the frontend cannot be merged via the GitHub UI, it will have be CLI. In all my usage of GitHub I don't think it's possible to make a PR from any repo to another. The GitHub UI only provides merge functionality from a fork to the upstream which is not this case.

I'd still really like to see a breakdown of the tasks involved in the merge, their execution order, and a time estimate for each step. We could do that when we write an implementation plan for this RFC. Two days maximum for a code freeze isn't too disruptive but definitely a concern in the case of high priority or critical bugs we might identify, even with a bug bash and small code freeze beforehand.

zackkrida · 2022-12-02T18:43:10Z

This looks great, @dhruvkb. I'm really excited to move forward with this and hope it's what the team wants. Some questions:

This RFC only covers the frontend + api combination. Would there be a follow-up RFC for merging the Catalog and plain WordPress/openverse repositories later on?
Do we need to write a follow up implementation plan? I see a lot of implementation details here which are great, but I also see some details you may not have considered, or perhaps considered out of scope for this RFC if the intention is to write a separate implementation plan:
- The root README.md. Will we write a new one? We should draft that earlier so we have it prepared, and perhaps move the current API README into the api/ directory. For example, we'll need to explain the new just install will install frontend and API dependencies, things like that.
- What will the unified repository be called? openverse-api would no longer make sense.
- We will need to revise any and all existing documentation that references the frontend and api repositories and update them with reference to the new, combined repo.
- Just reiterating here that we'll need to update the path at which GlotPress reads our translation files, to both update to the new repository and filepath. I believe the current path used is https://github.com/WordPress/openverse-frontend/blob/translations/openverse.pot
- We will need to write migration instructions for contributors, who will have local frontend repositories that now point to an archived repo, or api contributors who will now have significant file conflicts with the api repo.
Have you observed the duration of lint/required status actions to know if it will be a hindrance to development? For example, say we're attempting to push a small frontend hotfix, will actionlint now take dramatically longer?

When I clone monopenverse it's still quite performant and responsive in VS Code, which is great. I was concerned about the size being too memory-intensive to search and things, but it's working fine.

I observed one problem with prettier in VS Code, the root-level prettier.config.js depends on the prettier-plugin-tailwindcss which is in the node_modules directory of the frontend. VS Code doesn't seem to like this and looks for prettier in the repo root. To prevent future issues like this I do think it would be worth trying @sarayourfriend's workspaces approach to lift up node dependencies into the root of the repo, which will also somplify things when we combine the scripts (WordPress/openverse) and browser extension (should we renew our efforts to support it) repos into this one.

zackkrida · 2022-12-02T18:43:47Z

Also, 👍 the combined codeowners file is excellent!

obulat · 2022-12-15T09:01:44Z

rfcs/20221124-monorepo.md


-1. The entire system can be integration tested during releases. The real API, populated with test data, can even replace the Talkback server as long as we can turn off all network calls and enable 100% reliable output.
+1. The entire system can be integration tested during releases. The real API, populated with test data, can replace the Talkback server as long as we disable network calls and make output deterministic.


I love this! 🚀

We would have to make sure that the API response types never differ from the sample data, though. For example, if we say that the license URL is always present, and it's present in the test data, but absent in the real API responses, that would break the app even though the tests pass.

Next step would be integrating the catalog so the supplier of the data would also be a part of the tests 👍.

rfcs/20221124-monorepo.md

zackkrida · 2022-12-15T20:10:12Z

rfcs/20221124-monorepo.md

+1. Create "stack: \*" labels to help with issue and PR management.
+   Spoiler/foreshadowing: these labels will be used for more things later.


Perhaps this should happen before the migration? If we created the labels in advance, we could then apply each label to all issues in the respective repositories using the cli (I think this would be a very simple command to write), before transferring. The labels would be preserved during the transfer.

Yes, that makes more sense than transferring the issues, keeping a list of them and then applying labels.

rfcs/20221124-monorepo.md

zackkrida · 2022-12-15T20:15:21Z

rfcs/20221124-monorepo.md

+working with split repos quite productively for over a year. My proposition is
+the the monorepo solutions are better than workarounds.
+
+The [integration](#step-4-integration) section in the latter part of the


Suggested change

The [integration](#step-4-integration) section in the latter part of the

The [integration](#step-4-integration) section in the latter part of the

This appears out of date, as Step 4 now refers to a documentation merge.

rfcs/20221124-monorepo.md

zackkrida

The rationale and migration path look great to me. I am really sold on the idea of a monorepo and excited for the benefits I forsee.

I still think, however, that our implementation plan needs more refinement. What is here is a great start but there is a lot of specific cleanup and steps I would like to see documented.

For this reason, I would approve this PR as-is (with implementation details removed) if we create a separate implementation plan. Otherwise there are things I would like to see expanded on:

1. Infra repo considerations

In the infra repository, we have code which syncs many things across the repositories:

The required status checks and branch protections
The collaborators and their levels of access
Mapping secrets across multiple repositories (which you do mention)

What should the timing of changes be there? My thought would be that we should create a PR in advance which corrects the collaborators, repos, and secrets mapping by removing the to-be-archived repos. Then we would apply the terraform changes and merge the PR right after the actual repos are migrated.

2. Small details

Some small thoughts that are important but I'm not sure where to put them:

What preparation of the frontend and API repos do we need to do before the merge? No open PRs? For how long?
We should make the PR minimum review count for each repo > 2 on the day of the monorepo merge, to block any accidental PR merges.

3. The timing and implementation order

I suspect we could do much of the implementation work in advance, to dramatically simplify the day-of changes necessary to implement the monorepo. The idea would be do as much work non-destructively and in advance as possible, so that our code freeze can be as short as possible. The general idea would be:

Create all stack labels far in advance, and sync them to all repositories. On the day of launch use a cli command to bulk apply frontend and api stack labels to all open issues in each respective repository. When issues are transferred later, the labels will be preserved. We can also make the 'this repo is archived' notice PRs in advance of the actual monorepo merge too.
In WordPress/openverse, create a monorepo shared files PR containing all of the merged/unified files from monopenverse that you listed:

Expected conflicts	Resolution
.github/CODEOWNERS	Folderwise separation of owners, use from @dhruvkb/monoverse
.prettierignore	Merge entries from all repos, use from @dhruvkb/monoverse
.pre-commit-config.yaml	Python + Node.js linting, use from @dhruvkb/monoverse
justfile	Split into many smaller `justfile`s and one root^a
.gitignore	Split into many smaller `.gitignore`s and one root^b
CONTRIBUTING.md	Write new, referencing doc site
README.md	Write new, referencing doc site

This PR would be merged on the day of the monorepo migration as a first step. Crucially here, we address and fix all 'Step 4. Documentation merge' issues prior to the monorepo merge, rather than after it has happened.

In frontend, create a branch which moves all code into the frontend/ directory and removes the root-level docs which would conflict with the monorepo.

On the day of the merge we make sure this branch is up to date with main changes and ready to go. When we combine it with the openverse repo we use the command you mentioned (git pull frontend main --allow-unrelated-histories) but with the name of this created branch, instead of main.

The same approach is followed for the API, creating a branch with any conflicted files removed.
After the root level doc and action PR is merged both histories are immediately merged in and the issue transfer commands are run. Then we apply the terraform changes from the infra PR, to the new list of combined required status checks, secrets to be synced, and so on. Finally, we archive the repos.

This general approach allows us to have all of the combined root-level files prepared in advance in one PR, and to make the repo-specific file changes made in advance on a branch before the actual monorepo merge. I believe it would reduce our "down time" to a few hours on one day.

I would also suggest we have 1-3 people working on it synchronously like a launch to iron out issues on the day.

Does that make sense? Is there something major I'm overlooking? I really appreciate your acknowledgement that there will definitely be things we miss, and things that might break after combining the repos.

In general, I want to be really really careful and detailed about the actual implementation plan before we execute this. the best approach might be to keep this open indefinitely, after holiday afk, for further implementation discussion, rather than making two separate document. Alternatively, like I said I would support approving this PR if the plan was to make a separate implementation plan.

I'll also be awaiting my co-reviewer, @sarayourfriend's thoughts which I will take into consideration too.

sarayourfriend · 2022-12-15T23:12:34Z

rfcs/20221124-monorepo.md

-With this done, the development on the API and frontend can continue inside their subdirectories. The development of both parts will be independent. At least until we reach [long-term consolidation](#step-5-long-term-consolidation).
+Workflows with major changes:
+
+- `ci_cd.yml` from the API will absorb `ci.yml` from the frontend


What approach will we take to avoid running API tests when only frontend code is changed and vice-versa? Combining the CI workflows will preclude us from using the workflow paths filter. Did you have a different solution in mind?

We can use the path-filter action to do that.

sarayourfriend · 2022-12-15T23:15:19Z

rfcs/20221124-monorepo.md

+
+The soon-to-be-ECS-based API and ECS-based frontend will continue to deploy to staging via the CI + CD pipeline, with deployment starting as soon as all CI checks have passed. They will use similar code as the frontend auto-deploy for staging used currently.
+
+These will be separate jobs with specific [path-based filters](https://github.com/dorny/paths-filter).


What is the benefit of using this paths-filter action rather than the built-in GitHub workflow paths settings for workflow triggers? Similar to my other question: https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#using-filters-to-target-specific-paths-for-pull-request-or-push-events

This actions allows us to filter jobs instead of applying the filter to the entire workflow. This allows us to have a complete workflow where the jobs can be made to run based on the outcome of other jobs with needs. For example we can run the jobs to deploy the docs to GH Pages and Docker images to GHCR only when the CI jobs have passed.

If these jobs were in separate workflows, I don't think it's possible to run a workflow's jobs if jobs in a different workflow have passed.

It is possible to have cross-workflow dependencies, actually: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_run

I only have two concerns with using the actions:

Extremely repetitive and annoying code to maintain in the workflows (need to guard each frontend/api job with the same repeated step at the top of the job). Note that we could use YAML anchors to fix this slightly (still more repetition here compared to paths) but actionlint does not currently support them. We'd either need to contribute an upstream PR or stop using actionlint, the latter of which I think is unwise given how helpful it is at catching syntax errors locally.

It still actually uses a runner and worker time to complete whereas needs bypasses it entirely, therefore not eating up any of our quota. Considering the sheer number of jobs we're going to end up having in a single repository by the end of this migration, cutting down on the number of jobs that are run is prudent in this respect. This is one reason why we added all the concurrency configuration to our workflows, for example.

If you think either of these are not a concern, then that is fine. The first should be documented as an expectation and accepted trade off. I think the second would bite us later on, but I can accept that I may be being too cautious.

One more thing that crossed my mind as I was writing this is whether our concurrency settings will need to change for jobs. My hunch is no, but it's worth looking into to confirm as they can be quite nasty to think through.

workflow_run workflows depend on the "OR" combination of multiple workflows. This is different from needs which depends the "AND" combination of multiple jobs.

The annoying to maintain code might be true, but it can be overcome by abstracting the paths-filter job into a separate action with some inputs (similar to the setup-env action), thus limiting the annoyance to a single file.

Point 2 could be an issue as it will spin up a job and cancel it if the criterion fails, an alternative to that could be to have a job that only matches the condition of the paths. Let's say jobs called "is_api", "is_frontend" etc. The other jobs can needs on them so that they're skipped if their code paths are untouched. I think it will be a bit of a hassle to manage the pass/fail/skip cascades though so it's something that needs a bit of a dry-run first.

The other jobs can needs on them so that they're skipped if their code paths are untouched. I think it will be a bit of a hassle to manage the pass/fail/skip cascades though so it's something that needs a bit of a dry-run first.

I hadn't considered that, using a single (or handful of) job(s) that determine(s) which part of the project is being changed. It could actually be more flexible as the jobs could be is-frontend as you suggested, but also, if we know of specific cross-app dependencies that are fragile, we could have jobs for those as well (is-<whatever>-endpoint, etc). Then needs on other jobs actually sets up excellent self-documenting workflows and does not cause all the actions to run even if they're not needed. We'll have "failed" workflows, I guess, for when one of the project detection jobs doesn't match. I wonder how we would enforce "required" actions for different parts of the application? Can a job be required but skipped due to not matching the "needs"? I don't think so, in which case the secondary callable workflow you mentioned that takes a list of the relevant project parts for the job, while still tedious, at least allows some level of abstraction. Then we can still have required jobs that just automatically pass if they aren't relevant to the part of the code being changed.

sarayourfriend · 2022-12-15T23:17:02Z

rfcs/20221124-monorepo.md

+
+##### Production
+
+For production, we will not be able to use GitHub Releases and will have to use a manually-triggered workflow to build the assets and tag them. The tag can be set via a workflow input (simple) or can be calculated based on the update scope of major, minor or patch (not as simple).


Another way to simplify this is for us to switch to SHA based tags for release deployment. Rather than creating a new build, just use the build tagged with the HEAD SHA.

The slight issue with SHA tags would be that it's harder to identify versioning with them. Given two hashes, it's not possible to say which is the newer version.

Ah, good point!

Co-authored-by: Zack Krida <zackkrida@pm.me>

dhruvkb · 2022-12-16T08:46:56Z

I would prefer that the implementation plan be here and thoroughly discussed in one place rather than a partially complete RFC that's approved and merged without a concrete implementation plan. Keeping this PR open seems like a fair tradeoff for centralising all conversation about the entire undertaking.

I've already started work on eliminating some possible merge conflicts (such as the LICENSE and some almost identical workflows that only differ in repo name for example). Those PRs can be merged independently, ahead of the monorepo to make the migration simpler.

openverse-bot · 2023-01-26T00:00:07Z

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@AetherUnbound
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend¹ days, this PR was updated 4 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)².

@dhruvkb, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Specifically, Saturday and Sunday. ↩
For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range. ↩

zackkrida

Originally it was very important for me to have an implementation plan outlined in this PR, but as I understand @dhruvkb has completed multiple dry-runs of this migration in testing environments and is confident in the procedure to run it. Since this is also a one-off migration which will be executed by @dhruvkb and the timing of it occuring after our openverse.org launch tomorrow is ideal, I am comfortable approving this PR as-is.

Once our project process is more refined, I'll feel stricter about adhering to it.

AetherUnbound

Looking forward to it!

Co-authored-by: sarayourfriend <24264157+sarayourfriend@users.noreply.github.com> Co-authored-by: Zack Krida <zackkrida@pm.me>

dhruvkb added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository labels Nov 24, 2022

dhruvkb force-pushed the rfc_monorepo branch 3 times, most recently from 62b1cd8 to 40776f9 Compare November 24, 2022 16:22

zackkrida reviewed Nov 29, 2022

View reviewed changes

rfcs/20221124-monorepo.md Outdated Show resolved Hide resolved

Write the monorepo RFC

b49d7fc

dhruvkb force-pushed the rfc_monorepo branch from 40776f9 to b49d7fc Compare November 30, 2022 10:55

dhruvkb marked this pull request as ready for review November 30, 2022 10:59

dhruvkb requested a review from a team as a code owner November 30, 2022 10:59

dhruvkb requested review from AetherUnbound and stacimc November 30, 2022 10:59

sarayourfriend reviewed Nov 30, 2022

View reviewed changes

dhruvkb and others added 2 commits December 1, 2022 09:47

Fix typo in repository name

3b8caba

Co-authored-by: sarayourfriend <24264157+sarayourfriend@users.noreply.github.com>

Update incorrect info about reorganised files

f2122d8

stacimc reviewed Dec 1, 2022

View reviewed changes

zackkrida reviewed Dec 1, 2022

View reviewed changes

rfcs/20221124-monorepo.md Outdated Show resolved Hide resolved

zackkrida changed the title ~~Write the monorepo RFC~~ RFC: Monorepo Dec 1, 2022

Add required reviewers as suggested by @zackkrida

d619394

Co-authored-by: Zack Krida <zackkrida@pm.me>

zackkrida reviewed Dec 2, 2022

View reviewed changes

rfcs/20221124-monorepo.md Outdated Show resolved Hide resolved

zackkrida reviewed Dec 2, 2022

View reviewed changes

obulat reviewed Dec 15, 2022

View reviewed changes

Format file as per Prettier

53fe443

zackkrida reviewed Dec 15, 2022

View reviewed changes

rfcs/20221124-monorepo.md Outdated Show resolved Hide resolved

zackkrida reviewed Dec 15, 2022

View reviewed changes

rfcs/20221124-monorepo.md Show resolved Hide resolved

zackkrida reviewed Dec 15, 2022

View reviewed changes

rfcs/20221124-monorepo.md Outdated Show resolved Hide resolved

zackkrida reviewed Dec 15, 2022

View reviewed changes

rfcs/20221124-monorepo.md Outdated Show resolved Hide resolved

zackkrida requested changes Dec 15, 2022

View reviewed changes

sarayourfriend reviewed Dec 15, 2022

View reviewed changes

dhruvkb and others added 4 commits December 16, 2022 12:30

Update the instruction about copying

5763d16

Add the migration notice for the repos

f88693a

Co-authored-by: Zack Krida <zackkrida@pm.me>

Fix bad grammar

91e056b

Co-authored-by: Zack Krida <zackkrida@pm.me>

Fix linter violations

3b429c3

AetherUnbound removed their request for review January 30, 2023 19:24

zackkrida self-requested a review February 7, 2023 15:54

zackkrida approved these changes Feb 7, 2023

View reviewed changes

obulat mentioned this pull request Feb 9, 2023

Switch to Monorepo #384

Closed

16 tasks

AetherUnbound approved these changes Feb 15, 2023

View reviewed changes

dhruvkb added the 🔍 ov: meta Issue spans multiple repos and has sub-issues label Feb 20, 2023

dhruvkb merged commit e8e8abe into main Feb 20, 2023

dhruvkb deleted the rfc_monorepo branch February 20, 2023 15:40

dhruvkb added a commit that referenced this pull request Feb 20, 2023

RFC: Monorepo (#336)

d40a456

Co-authored-by: sarayourfriend <24264157+sarayourfriend@users.noreply.github.com> Co-authored-by: Zack Krida <zackkrida@pm.me>


		1. Central place for all technical documentation, enabling documentation for different parts of the stack to cross-reference other pieces and stay current with changes in other places.

		1. Enables the infra to deploy the code to coexist with the code itself. Apart from the private secrets that will still need to be encrypted, the IaC files could be organised identical to the code.


		- In fact, we employ a number of hacks to install and configure pre-commit for the frontend. Merging it with the API eliminates the need for such hacks.

		1. The entire system can be integration tested during releases. The real API, populated with test data, can even replace the Talkback server.


		1. The entire system can be integration tested during releases. The real API, populated with test data, can even replace the Talkback server.

		The `WordPress/openverse-api` repo will absorb the `WordPress/openverse-frontend` repo. The `WordPress/openverse-catalog` will also be merged, _later_.


		#### Get the timing right

		The first step will be to release the frontend, call a code freeze and pause work on it. This is to prevent the frontend repo from continuing to drift as we merge a snapshot of it with the API.


		This is a quick process.

		1. Move the entire content of frontend inside a `frontend/` directory, except the following top-level files and folders. Please comment if you can add to this list.

		I will need more information about this because IANAL.

		- LICENSE (both repos)


		## Migration path

		First we will merge the API and the frontend. This decision was made for the following reasons.


		The first step will be to release the frontend, call a code freeze and pause work on it. This is to prevent the frontend repo from continuing to drift as we merge a snapshot of it with the API.

		This can prove difficult given how productive our team is, so we will need to channel this productivity towards the catalog in the meantime. I can foresee the end-to-end migration taking one week (ideal scenario) to one fortnight (worst case scenario).

		1. Create "stack: \*" labels to help with issue and PR management.
		Spoiler/foreshadowing: these labels will be used for more things later.

	The [integration](#step-4-integration) section in the latter part of the
	The [integration](#step-4-integration) section in the latter part of the


		The soon-to-be-ECS-based API and ECS-based frontend will continue to deploy to staging via the CI + CD pipeline, with deployment starting as soon as all CI checks have passed. They will use similar code as the frontend auto-deploy for staging used currently.

		These will be separate jobs with specific [path-based filters](https://github.com/dorny/paths-filter).


		##### Production

		For production, we will not be able to use GitHub Releases and will have to use a manually-triggered workflow to build the assets and tag them. The tag can be set via a workflow input (simple) or can be calculated based on the update scope of major, minor or patch (not as simple).

RFC: Monorepo #336

RFC: Monorepo #336

Conversation

dhruvkb commented Nov 24, 2022 • edited Loading

Fixes

Description

Testing Instructions

Checklist

Developer Certificate of Origin

dhruvkb commented Nov 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvkb Dec 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvkb Dec 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarayourfriend Dec 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvkb Dec 1, 2022 • edited Loading

Choose a reason for hiding this comment

zackkrida commented Dec 1, 2022

Choose a reason for hiding this comment

zackkrida Dec 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zackkrida commented Dec 2, 2022 • edited Loading

zackkrida commented Dec 2, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zackkrida left a comment • edited Loading

Choose a reason for hiding this comment

1. Infra repo considerations

2. Small details

3. The timing and implementation order

sarayourfriend Dec 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvkb Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvkb commented Dec 16, 2022 • edited Loading

openverse-bot commented Jan 26, 2023

Footnotes

zackkrida left a comment

Choose a reason for hiding this comment

AetherUnbound left a comment

Choose a reason for hiding this comment

dhruvkb commented Nov 24, 2022 •

edited

Loading

dhruvkb Dec 1, 2022 •

edited

Loading

dhruvkb Dec 1, 2022 •

edited

Loading

sarayourfriend Dec 1, 2022 •

edited

Loading

dhruvkb Dec 1, 2022 •

edited

Loading

zackkrida Dec 2, 2022 •

edited

Loading

zackkrida commented Dec 2, 2022 •

edited

Loading

zackkrida left a comment •

edited

Loading

sarayourfriend Dec 15, 2022 •

edited

Loading

dhruvkb Dec 16, 2022 •

edited

Loading

dhruvkb commented Dec 16, 2022 •

edited

Loading