Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Monorepo #336

Merged
merged 13 commits into from
Feb 20, 2023
Merged

RFC: Monorepo #336

merged 13 commits into from
Feb 20, 2023

Conversation

dhruvkb
Copy link
Member

@dhruvkb dhruvkb commented Nov 24, 2022

Fixes

Fixes #205 by @zackkrida

Description

This PR writes the monorepo RFC.

Proof of Concept can be seen at https://github.com/dhruvkb/monopenverse https://github.com/dhruvkb/monoverse.

Testing Instructions

  1. Read the RFC.
  2. Give your Cs.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@dhruvkb dhruvkb added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository labels Nov 24, 2022
@dhruvkb dhruvkb force-pushed the rfc_monorepo branch 3 times, most recently from 62b1cd8 to 40776f9 Compare November 24, 2022 16:22
@dhruvkb
Copy link
Member Author

dhruvkb commented Nov 30, 2022

After extensive testing in the PoC, everything is working. In the PoC, no existing functionality is broken and a lot of things are simplified, merged or otherwise improved.

In my testing, the monorepo migration was a net positive so I suggest we move forward with it.

@dhruvkb dhruvkb marked this pull request as ready for review November 30, 2022 10:59
@dhruvkb dhruvkb requested a review from a team as a code owner November 30, 2022 10:59

1. Central place for all technical documentation, enabling documentation for different parts of the stack to cross-reference other pieces and stay current with changes in other places.

1. Enables the infra to deploy the code to coexist with the code itself. Apart from the private secrets that will still need to be encrypted, the IaC files could be organised identical to the code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the IaC files could be organised identical to the code.

Just a nit but this isn't entirely true. We could maybe spread some Terraform modules out this way but it would be a pain to have so many places to search for modules. If the Terraform code will be merged into the monorepo as well then it'd be better off assuming that it will live, in totality, in a separate directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my lack of knowledge in the IaC space is really showing here. I would appreciate if you could add any infra issues (especially any dealbreakers) and any infra benefits that you see popping up due to this merge. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an infra perspective I don't think there are any benefits really. The only thing I could see is if we did this issue: https://github.com/WordPress/openverse-infrastructure/issues/220. The benefit of that issue could be a more easily reusable workflow with better variables in the deployment workflows we have now. Basically everything that is used to define the deployment workflows are variables that we could theoretically move into GitHub, thought it's just more manual management of those details. We could try to sync them other ways like with a GitHub secret of a JSON blob that gets passed through so the workflow call is something like:

- uses: ./actions/deploy.yml
  with:
    config: ${{ secrets.NUXT_DEPLOYMENT_CONFIG }}

And then the workflow parses the JSON to get the configuration.

Note: For folks without access to the infrastructure repository, the issue above describes switching to a callable workflow instead of a composite action. I've found composite actions a little more tedious so I avoided trying to make the current deployment workflows super modular. I think a callable workflow would be easier to work with for a lot of reasons and could open up the possibility of making a more module/generic single-deployment workflow file a little more realistic.

We could do all of this in our current setup, but a monorepo would end up with less places to sync the secrets to and less places to put the deployment action.

Actually, that reminds me, if the infrastructure is in the same repository, the deployment workflow outputs could be automatically merged whenever changes are made to them. Right now we have to manually sync them from the infrastructure repository to wherever they're needed. So that would be a nice simplification of that process as well (one less thing to remember to do!) 🙂


- In fact, we employ a number of hacks to install and configure pre-commit for the frontend. Merging it with the API eliminates the need for such hacks.

1. The entire system can be integration tested during releases. The real API, populated with test data, can even replace the Talkback server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea! For this to be feasible, we would need to come up with a definitive solution to preventing the API from making network requests when it executes a search. Right now Playwright tests do not depend at all on any external network and that needs to remain the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be done (maybe something like an environment variable flag to not make requests) and I can envision it being glorious when it works.


1. The entire system can be integration tested during releases. The real API, populated with test data, can even replace the Talkback server.

The `WordPress/openverse-api` repo will absorb the `WordPress/openverse-frontend` repo. The `WordPress/openverse-catalog` will also be merged, _later_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a period of time during which frontend work is happening only in openverse-api? Is the long-term plan to merge everything into openverse-api and then eventually rename that to just openverse?

Copy link
Member Author

@dhruvkb dhruvkb Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The timeline will be

  1. pause work on the frontend
  2. merge WordPress/openverse-frontend into WordPress/openverse-api
  3. transfer issues and other relevant GitHub models
  4. get fronted functional, enable deploys again
  5. resume frontend work in WordPress/opnenverse-api
  6. archive WordPress/opnenverse-frontend

The one week to one fortnight period mentioned in the RFC is the time between step 1 and step 5.


#### Get the timing right

The first step will be to release the frontend, call a code freeze and pause work on it. This is to prevent the frontend repo from continuing to drift as we merge a snapshot of it with the API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we schedule a bug bash before this so that we can identify and resolve any outstanding issues before we enter the period of time where we won't be able to without great disruption?

Copy link
Member Author

@dhruvkb dhruvkb Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be ideal (maybe even necessary). Before embarking on this potential two-week freeze, it would be best to have the frontend be in a state that could serve two weeks without maintenance.

If we have no high or critical priority tickets, that would indicate optimal time to work on this move. Getting the timing right is one of the main things here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are planning to do a bug discovery session during the Istanbul meetup in January, maybe targeting this for late February (or later/earlier depending on what bugs we find) could be good, provided everyone is on board. It'd be nice to know that ahead of time as well so that we could block off the time and expect frontend feature work to undergo a stoppage at a specific period of time rather than just "sometime in the future". Did you have any particular alternative scheduling ideas in mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug bash we had in Venice was very productive so repeating it seems logical. But there are two potential issues with that:

  • We only identified and filed issues in the last meetup, fixing them afterwards.
  • I can't speak for everyone but my productivity immediately after the meetup was very low.

IMO the best time for this migration would be when all the frontend devs' productivity has gone down (less activity = easier migration) so if we could iron out the bugs before the meetup, the meetup would be the best time to migrate this because we'd all be together to collaborate in the migration process.

We do need to set down a date for this but we should only do this after the iframe removal project is deployed, and has been stable for a week or two. Other than that, we can make any time frame work.

Copy link
Contributor

@sarayourfriend sarayourfriend Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should try to do a bug bash before the meetup then. We could even start the monorepo migration during the meetup given the core contributors will not likely be spending much time making direct frontend contributions. That way we could easily pair on any unexpected issues that crop up.

Though that would be hard if not impossible to do before the iframe project, given the current blockers and ambiguous time frame for resolving them 😢

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The period of the code freeze is a big concern for me.

Having a bug bash before it is a really great idea. It's really difficult to plan it because we probably need a sync meeting to identify bugs, [very rough estimates] a day or 2 to document all of them, and then 1-3 weeks to fix everything, especially considering the slow pace of the PR review process. We would probably only want to fix the high and critical priority bugs.

I wonder how this process be affected by the features that we plan to work on now: navigation menu improvements and the homepage. Also, if we plan to work on the addition of analytics next, would that need to be finished before the monorepo move? Or fully postponed till after the move?

Starting the monorepo migration during meetup is great for the reasons Sara mentions. However, I'm not sure that the process itself is a great use of our time at the meetup. Someone will have to be working on the migration steps during the meetup and in the week after that.


This is a quick process.

1. Move the entire content of frontend inside a `frontend/` directory, except the following top-level files and folders. Please comment if you can add to this list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the plan for making pnpm work in the monorepo? Do we want to go ahead and making pnpm live at the root of the repository and treat the frontend as a pnpm monorepo package? https://pnpm.io/workspaces

I think this would be necessary for us to be able to (reasonably) merge the browser extension and the JS that is currently in WordPress/openverse into the monorepo... otherwise we'll be fighting weird fights with pnpm in different directories 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was an issue. Currently I had everything working with pnpm with package.json moved into the frontend/ directly (no package manager at the root), but I definitely should look into the workspaces feature.

Comment on lines 82 to 84
- `.prettierignore` (symlink into the `frontend/` directory)
- `.eslintrc.js` (symlink into the `frontend/` directory)
- `.eslintignore` (symlink into the `frontend/` directory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these need to be symlinked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an error on my part. I was performing lots of experiments in https://github.com/dhruvkb/monopenverse and with linting to work so I tried almost everything. When I got it to work finally, I forgot to update the RFC with the right approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f2122d8.

rfcs/20221124-monorepo.md Outdated Show resolved Hide resolved
Comment on lines +176 to +178
I will need more information about this because IANAL.

- LICENSE (both repos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Openverse still uses MIT we have probably very little to worry about here, FWIW.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, but I had a vague feeling that merging repositories could have legal implications for those that use them somehow.

- CONTRIBUTING.md (both repos)
- CONTRIBUTORS.md (API only; also why?)
- DOCUMENTATION_GUIDELINES.md (API only)
- TESTING_GUIDELINES.md (frontend only)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would probably want to move this into the frontend directory, I would think. It is extremely frontend specific. Or would it go into the Sphinx docs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My long-term goal is for Sphinx docs to be the single source of reference for all developers, not just API devs.

Some Markdown files that people generally look for in the repo root could be left there but ideally, even they would just point to the relevant Sphinx pages for the real stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that plan!

dhruvkb and others added 2 commits December 1, 2022 09:47
Co-authored-by: sarayourfriend <24264157+sarayourfriend@users.noreply.github.com>

## Migration path

First we will merge the API and the frontend. This decision was made for the following reasons.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reasoning makes a lot of sense to me, and I'm sold on merging API/frontend first and separately from the Catalog. How much more difficult do you anticipate eventually merging the Catalog to be? Do you think there's value in planning that work first, or can that be tabled until after the API/frontend are merged?

Copy link
Member Author

@dhruvkb dhruvkb Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fairly confident that merging the catalog would not cause major pain later on, mostly because it's another Python project. The API and ingestion server are both independent Python projects that have coexisted in harmony since the beginning so adding a third does not seem problematic.

We could definitely plan for the eventual catalog-API merge, but I abstained from that in the RFC because

  • I am not the most experienced with the catalog so I felt a bit underqualified to write about that in the RFC
  • I didn't want to delay the RFC till I could understand the nuances of the catalog enough to include the portion about merging it

rfcs/20221124-monorepo.md Outdated Show resolved Hide resolved
@zackkrida zackkrida changed the title Write the monorepo RFC RFC: Monorepo Dec 1, 2022
@zackkrida
Copy link
Member

One note regarding translations I'd like to make sure is mentioned in this RFC:

We will have to contact a member of the WP Meta team (who works on wp.org) to change the path at which GlotPress reads our translation files, to both update to the new repository and filepath. I believe the current path used is https://github.com/WordPress/openverse-frontend/blob/translations/openverse.pot

Co-authored-by: Zack Krida <zackkrida@pm.me>
rfcs/20221124-monorepo.md Outdated Show resolved Hide resolved
- `.eslintignore`
- <s>`.gitignore`</s> (better to move it into the `frontend/` directory and update some absolute paths)

1. Create the final commit on the `WordPress/openverse-frontend` repo. After the merge we might want to add a notice about the migration to the `README.md` file but GitHub's built-in archival process could suffice here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we definitely want to add a README notice and use the GitHub archiving functionality together. The README information will crucially link to the new repository. We should also update the repo description with the new url as well, here:

CleanShot 2022-12-02 at 13 20 13


The first step will be to release the frontend, call a code freeze and pause work on it. This is to prevent the frontend repo from continuing to drift as we merge a snapshot of it with the API.

This can prove difficult given how productive our team is, so we will need to channel this productivity towards the catalog in the meantime. I can foresee the end-to-end migration taking one week (ideal scenario) to one fortnight (worst case scenario).
Copy link
Member

@zackkrida zackkrida Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can foresee the end-to-end migration taking one week (ideal scenario) to one fortnight (worst case scenario).

@dhruvkb Could you unpack this time estimate a bit more for me? I am curious because I could see these tasks taking 1-2hrs or so and being treated like a deploy, where a few folks work on it together synchronously. I also think that would have benefit of minimizing the disruption to contributors. Are there long-running tasks I'm not considering?

I would also like to see some concrete implementation details about, for example, how we plan to make a PR for some of the meta/internal changes to the repo (the unified CODEOWNERS file, a new README, the modified actions, etc.). Could we have those changes prepared in a draft PR that we only merge after the frontend main branch has been combined with the API? And regarding that frontend step, do we plan to do it via PR in the GitHub UI? Details like that will give me much more confidence in our final approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The missing implementation details (such as writing new documentation, updating references) are what will take the maximum time in the 1-2 week time range I proposed for this plan.

The technical merge is a and getting the functionality working is a day or two. The remaining 3 days are for identifying these issues and fixing them. I kept 5 more days as a backup in case something unforeseen comes up but given we do sufficient planning beforehand, we will not need the second week (underpromise, overdeliver 😉).

It might be hard to make a draft PR with the changes (because of the merge conflicts) but one thing we could do is start a new branch, do the merge on it and eventually replace main with it. That seems like a great way to see everything in action but a lot of (branch-name dependent actions will be affected when we promote the other branch).

In any case, the frontend cannot be merged via the GitHub UI, it will have be CLI. In all my usage of GitHub I don't think it's possible to make a PR from any repo to another. The GitHub UI only provides merge functionality from a fork to the upstream which is not this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still really like to see a breakdown of the tasks involved in the merge, their execution order, and a time estimate for each step. We could do that when we write an implementation plan for this RFC. Two days maximum for a code freeze isn't too disruptive but definitely a concern in the case of high priority or critical bugs we might identify, even with a bug bash and small code freeze beforehand.

@zackkrida
Copy link
Member

zackkrida commented Dec 2, 2022

This looks great, @dhruvkb. I'm really excited to move forward with this and hope it's what the team wants. Some questions:

  1. This RFC only covers the frontend + api combination. Would there be a follow-up RFC for merging the Catalog and plain WordPress/openverse repositories later on?
  2. Do we need to write a follow up implementation plan? I see a lot of implementation details here which are great, but I also see some details you may not have considered, or perhaps considered out of scope for this RFC if the intention is to write a separate implementation plan:
    • The root README.md. Will we write a new one? We should draft that earlier so we have it prepared, and perhaps move the current API README into the api/ directory. For example, we'll need to explain the new just install will install frontend and API dependencies, things like that.
    • What will the unified repository be called? openverse-api would no longer make sense.
    • We will need to revise any and all existing documentation that references the frontend and api repositories and update them with reference to the new, combined repo.
    • Just reiterating here that we'll need to update the path at which GlotPress reads our translation files, to both update to the new repository and filepath. I believe the current path used is https://github.com/WordPress/openverse-frontend/blob/translations/openverse.pot
    • We will need to write migration instructions for contributors, who will have local frontend repositories that now point to an archived repo, or api contributors who will now have significant file conflicts with the api repo.
  3. Have you observed the duration of lint/required status actions to know if it will be a hindrance to development? For example, say we're attempting to push a small frontend hotfix, will actionlint now take dramatically longer?

When I clone monopenverse it's still quite performant and responsive in VS Code, which is great. I was concerned about the size being too memory-intensive to search and things, but it's working fine.

I observed one problem with prettier in VS Code, the root-level prettier.config.js depends on the prettier-plugin-tailwindcss which is in the node_modules directory of the frontend. VS Code doesn't seem to like this and looks for prettier in the repo root. To prevent future issues like this I do think it would be worth trying @sarayourfriend's workspaces approach to lift up node dependencies into the root of the repo, which will also somplify things when we combine the scripts (WordPress/openverse) and browser extension (should we renew our efforts to support it) repos into this one.

@zackkrida
Copy link
Member

Also, 👍 the combined codeowners file is excellent!


1. The entire system can be integration tested during releases. The real API, populated with test data, can even replace the Talkback server as long as we can turn off all network calls and enable 100% reliable output.
1. The entire system can be integration tested during releases. The real API, populated with test data, can replace the Talkback server as long as we disable network calls and make output deterministic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this! 🚀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to make sure that the API response types never differ from the sample data, though. For example, if we say that the license URL is always present, and it's present in the test data, but absent in the real API responses, that would break the app even though the tests pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next step would be integrating the catalog so the supplier of the data would also be a part of the tests 👍.

Comment on lines +301 to +302
1. Create "stack: \*" labels to help with issue and PR management.
Spoiler/foreshadowing: these labels will be used for more things later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should happen before the migration? If we created the labels in advance, we could then apply each label to all issues in the respective repositories using the cli (I think this would be a very simple command to write), before transferring. The labels would be preserved during the transfer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes more sense than transferring the issues, keeping a list of them and then applying labels.

working with split repos quite productively for over a year. My proposition is
the the monorepo solutions are better than workarounds.

The [integration](#step-4-integration) section in the latter part of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The [integration](#step-4-integration) section in the latter part of the
The [integration](#step-4-integration) section in the latter part of the

This appears out of date, as Step 4 now refers to a documentation merge.

Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale and migration path look great to me. I am really sold on the idea of a monorepo and excited for the benefits I forsee.

I still think, however, that our implementation plan needs more refinement. What is here is a great start but there is a lot of specific cleanup and steps I would like to see documented.

For this reason, I would approve this PR as-is (with implementation details removed) if we create a separate implementation plan. Otherwise there are things I would like to see expanded on:

1. Infra repo considerations

In the infra repository, we have code which syncs many things across the repositories:

  • The required status checks and branch protections
  • The collaborators and their levels of access
  • Mapping secrets across multiple repositories (which you do mention)

What should the timing of changes be there? My thought would be that we should create a PR in advance which corrects the collaborators, repos, and secrets mapping by removing the to-be-archived repos. Then we would apply the terraform changes and merge the PR right after the actual repos are migrated.

2. Small details

Some small thoughts that are important but I'm not sure where to put them:

  • What preparation of the frontend and API repos do we need to do before the merge? No open PRs? For how long?
  • We should make the PR minimum review count for each repo > 2 on the day of the monorepo merge, to block any accidental PR merges.

3. The timing and implementation order

I suspect we could do much of the implementation work in advance, to dramatically simplify the day-of changes necessary to implement the monorepo. The idea would be do as much work non-destructively and in advance as possible, so that our code freeze can be as short as possible. The general idea would be:

  1. Create all stack labels far in advance, and sync them to all repositories. On the day of launch use a cli command to bulk apply frontend and api stack labels to all open issues in each respective repository. When issues are transferred later, the labels will be preserved. We can also make the 'this repo is archived' notice PRs in advance of the actual monorepo merge too.

  2. In WordPress/openverse, create a monorepo shared files PR containing all of the merged/unified files from monopenverse that you listed:

Expected conflicts Resolution
.github/CODEOWNERS Folderwise separation of owners, use from @dhruvkb/monoverse
.prettierignore Merge entries from all repos, use from @dhruvkb/monoverse
.pre-commit-config.yaml Python + Node.js linting, use from @dhruvkb/monoverse
justfile Split into many smaller justfiles and one roota
.gitignore Split into many smaller .gitignores and one rootb
CONTRIBUTING.md Write new, referencing doc site
README.md Write new, referencing doc site

This PR would be merged on the day of the monorepo migration as a first step. Crucially here, we address and fix all 'Step 4. Documentation merge' issues prior to the monorepo merge, rather than after it has happened.

  1. In frontend, create a branch which moves all code into the frontend/ directory and removes the root-level docs which would conflict with the monorepo.

On the day of the merge we make sure this branch is up to date with main changes and ready to go. When we combine it with the openverse repo we use the command you mentioned (git pull frontend main --allow-unrelated-histories) but with the name of this created branch, instead of main.

  1. The same approach is followed for the API, creating a branch with any conflicted files removed.
  2. After the root level doc and action PR is merged both histories are immediately merged in and the issue transfer commands are run. Then we apply the terraform changes from the infra PR, to the new list of combined required status checks, secrets to be synced, and so on. Finally, we archive the repos.

This general approach allows us to have all of the combined root-level files prepared in advance in one PR, and to make the repo-specific file changes made in advance on a branch before the actual monorepo merge. I believe it would reduce our "down time" to a few hours on one day.

I would also suggest we have 1-3 people working on it synchronously like a launch to iron out issues on the day.

Does that make sense? Is there something major I'm overlooking? I really appreciate your acknowledgement that there will definitely be things we miss, and things that might break after combining the repos.


In general, I want to be really really careful and detailed about the actual implementation plan before we execute this. the best approach might be to keep this open indefinitely, after holiday afk, for further implementation discussion, rather than making two separate document. Alternatively, like I said I would support approving this PR if the plan was to make a separate implementation plan.

I'll also be awaiting my co-reviewer, @sarayourfriend's thoughts which I will take into consideration too.

With this done, the development on the API and frontend can continue inside their subdirectories. The development of both parts will be independent. At least until we reach [long-term consolidation](#step-5-long-term-consolidation).
Workflows with major changes:

- `ci_cd.yml` from the API will absorb `ci.yml` from the frontend
Copy link
Contributor

@sarayourfriend sarayourfriend Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What approach will we take to avoid running API tests when only frontend code is changed and vice-versa? Combining the CI workflows will preclude us from using the workflow paths filter. Did you have a different solution in mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the path-filter action to do that.


The soon-to-be-ECS-based API and ECS-based frontend will continue to deploy to staging via the CI + CD pipeline, with deployment starting as soon as all CI checks have passed. They will use similar code as the frontend auto-deploy for staging used currently.

These will be separate jobs with specific [path-based filters](https://github.com/dorny/paths-filter).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of using this paths-filter action rather than the built-in GitHub workflow paths settings for workflow triggers? Similar to my other question: https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#using-filters-to-target-specific-paths-for-pull-request-or-push-events

Copy link
Member Author

@dhruvkb dhruvkb Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actions allows us to filter jobs instead of applying the filter to the entire workflow. This allows us to have a complete workflow where the jobs can be made to run based on the outcome of other jobs with needs. For example we can run the jobs to deploy the docs to GH Pages and Docker images to GHCR only when the CI jobs have passed.

If these jobs were in separate workflows, I don't think it's possible to run a workflow's jobs if jobs in a different workflow have passed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to have cross-workflow dependencies, actually: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_run

I only have two concerns with using the actions:

  1. Extremely repetitive and annoying code to maintain in the workflows (need to guard each frontend/api job with the same repeated step at the top of the job). Note that we could use YAML anchors to fix this slightly (still more repetition here compared to paths) but actionlint does not currently support them. We'd either need to contribute an upstream PR or stop using actionlint, the latter of which I think is unwise given how helpful it is at catching syntax errors locally.
  2. It still actually uses a runner and worker time to complete whereas needs bypasses it entirely, therefore not eating up any of our quota. Considering the sheer number of jobs we're going to end up having in a single repository by the end of this migration, cutting down on the number of jobs that are run is prudent in this respect. This is one reason why we added all the concurrency configuration to our workflows, for example.

If you think either of these are not a concern, then that is fine. The first should be documented as an expectation and accepted trade off. I think the second would bite us later on, but I can accept that I may be being too cautious.

One more thing that crossed my mind as I was writing this is whether our concurrency settings will need to change for jobs. My hunch is no, but it's worth looking into to confirm as they can be quite nasty to think through.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflow_run workflows depend on the "OR" combination of multiple workflows. This is different from needs which depends the "AND" combination of multiple jobs.

The annoying to maintain code might be true, but it can be overcome by abstracting the paths-filter job into a separate action with some inputs (similar to the setup-env action), thus limiting the annoyance to a single file.

Point 2 could be an issue as it will spin up a job and cancel it if the criterion fails, an alternative to that could be to have a job that only matches the condition of the paths. Let's say jobs called "is_api", "is_frontend" etc. The other jobs can needs on them so that they're skipped if their code paths are untouched. I think it will be a bit of a hassle to manage the pass/fail/skip cascades though so it's something that needs a bit of a dry-run first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other jobs can needs on them so that they're skipped if their code paths are untouched. I think it will be a bit of a hassle to manage the pass/fail/skip cascades though so it's something that needs a bit of a dry-run first.

I hadn't considered that, using a single (or handful of) job(s) that determine(s) which part of the project is being changed. It could actually be more flexible as the jobs could be is-frontend as you suggested, but also, if we know of specific cross-app dependencies that are fragile, we could have jobs for those as well (is-<whatever>-endpoint, etc). Then needs on other jobs actually sets up excellent self-documenting workflows and does not cause all the actions to run even if they're not needed. We'll have "failed" workflows, I guess, for when one of the project detection jobs doesn't match. I wonder how we would enforce "required" actions for different parts of the application? Can a job be required but skipped due to not matching the "needs"? I don't think so, in which case the secondary callable workflow you mentioned that takes a list of the relevant project parts for the job, while still tedious, at least allows some level of abstraction. Then we can still have required jobs that just automatically pass if they aren't relevant to the part of the code being changed.


##### Production

For production, we will not be able to use GitHub Releases and will have to use a manually-triggered workflow to build the assets and tag them. The tag can be set via a workflow input (simple) or can be calculated based on the update scope of major, minor or patch (not as simple).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to simplify this is for us to switch to SHA based tags for release deployment. Rather than creating a new build, just use the build tagged with the HEAD SHA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slight issue with SHA tags would be that it's harder to identify versioning with them. Given two hashes, it's not possible to say which is the newer version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point!

dhruvkb and others added 4 commits December 16, 2022 12:30
Co-authored-by: Zack Krida <zackkrida@pm.me>
Co-authored-by: Zack Krida <zackkrida@pm.me>
@dhruvkb
Copy link
Member Author

dhruvkb commented Dec 16, 2022

I would prefer that the implementation plan be here and thoroughly discussed in one place rather than a partially complete RFC that's approved and merged without a concrete implementation plan. Keeping this PR open seems like a fair tradeoff for centralising all conversation about the entire undertaking.

I've already started work on eliminating some possible merge conflicts (such as the LICENSE and some almost identical workflows that only differ in repo name for example). Those PRs can be merged independently, ahead of the monorepo to make the migration simpler.

@openverse-bot
Copy link
Collaborator

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@AetherUnbound
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was updated 4 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2.

@dhruvkb, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

@AetherUnbound AetherUnbound removed their request for review January 30, 2023 19:24
@zackkrida zackkrida self-requested a review February 7, 2023 15:54
Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally it was very important for me to have an implementation plan outlined in this PR, but as I understand @dhruvkb has completed multiple dry-runs of this migration in testing environments and is confident in the procedure to run it. Since this is also a one-off migration which will be executed by @dhruvkb and the timing of it occuring after our openverse.org launch tomorrow is ideal, I am comfortable approving this PR as-is.

Once our project process is more refined, I'll feel stricter about adhering to it.

@obulat obulat mentioned this pull request Feb 9, 2023
16 tasks
Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to it!

@dhruvkb dhruvkb added the 🔍 ov: meta Issue spans multiple repos and has sub-issues label Feb 20, 2023
@dhruvkb dhruvkb merged commit e8e8abe into main Feb 20, 2023
@dhruvkb dhruvkb deleted the rfc_monorepo branch February 20, 2023 15:40
dhruvkb added a commit that referenced this pull request Feb 20, 2023
Co-authored-by: sarayourfriend <24264157+sarayourfriend@users.noreply.github.com>
Co-authored-by: Zack Krida <zackkrida@pm.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🔍 ov: meta Issue spans multiple repos and has sub-issues 🟨 priority: medium Not blocking but should be addressed soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write an RFC for monorepo support
8 participants