Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submodules step 1/2: convert archivers to a plugin architecture #5597

Merged
merged 3 commits into from
Jan 20, 2024

Conversation

Groxx
Copy link
Contributor

@Groxx Groxx commented Jan 17, 2024

Breaking the gcloud plugin into a separate submodule (so the main module can remove these optional libraries) requires that the main module does not import it.

Since this was a hard-coded switch statement, that wasn't really possible. So I've converted it to an at-init-time registration system, and I'll invert that import-flow in a later PR. This one is intentionally trying to be a (relatively) straightforward refactor.

This is almost certainly better not using init, and the scheme/key separation seems strange... but that's a less-obviously-identical-behavior change, and we kinda need to split this apart ASAP.
Step 2 will introduce two submodules (gcloud plugin, and cmd/server which will import both the plugin and the "main" module), along with some makefile/automation changes to make things continue working like before.

@coveralls
Copy link

coveralls commented Jan 17, 2024

Pull Request Test Coverage Report for Build 018d2410-822e-4dc1-8ff6-080f38b1efe8

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.7%) to 62.601%

Totals Coverage Status
Change from base Build 018d2407-018e-4309-b375-a1e9f289623c: -0.7%
Covered Lines: 91719
Relevant Lines: 146513

💛 - Coveralls

@Groxx Groxx marked this pull request as ready for review January 19, 2024 20:34
Previously this was a switch statement, which cannot have control inverted.
&config.VisibilityArchiverProvider{
Filestore: cfg,
},
node, err := config.ToYamlNode(cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Contributor Author

@Groxx Groxx Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not my proudest bit of shenaniganry, I'll admit, but it is at least pretty simple :)

tbh I'm not quite sure what a type-safer setup would look like. at worst it could be an interface that just sets the out var to an in-memory instance, but that didn't feel worth it yet...

mut sync.Mutex
data map[K]V
}
SyncMap[K comparable, V any] interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look too hard in our libraries for similar stuff... but this is simple enough that tbh I just kinda don't care.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a unit test for this?

Copy link
Contributor Author

@Groxx Groxx Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Groxx Groxx merged commit 9041a4f into uber:master Jan 20, 2024
16 checks passed
@Groxx Groxx deleted the submod_final_step1 branch January 20, 2024 00:09
mut sync.Mutex
data map[K]V
}
SyncMap[K comparable, V any] interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a unit test for this?

)

var _ yaml.Unmarshaler = (*YamlNode)(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add unit test for new functions here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently being tested via the config Load-using tests, which I think does cover this in this package... but lemme check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah:
Screenshot 2024-01-22 at 16 06 40

so it'd just be for invalid yaml (probably not possible here) / broken custom marshalers and unmarshalers (we have none afaik).

Groxx added a commit that referenced this pull request Jan 23, 2024
# What

Step 1 was in PR #5597

This second commit essentially:
1. splits the gcloud-only dependencies into a submodule which imports the main `github.com/uber/cadence` module, and not the reverse
2. also splits the "main" server build (`cmd/server`) into a submodule which imports the gcloud plugin and the main module (so that binary is unchanged)
3. removes `cloud.google.com/*` dependencies from the main `go.mod` (requires both 1 and 2)
4. adjusts things to work like they did before

Ultimately reducing our "main" (truly required) dependencies, while not making major changes to using or developing Cadence.

# Why

Part on principle, and part due to internal version conflicts.

In principle, "plugins" are "optional" and we should not be forcing _all_ optional dependencies on _all_ users of _any_ of Cadence.  
Splitting dependencies into choose-your-own-adventure submodules is simply good library design for the ecosystem, and it's something we should be doing more of.

In practice, this is essentially being forced because the gcloud archiver plugin pulls in some `cloud.google.com/*` stuff, which we upgraded, which forces breaking `google.golang.org/genproto` upgrades, which conflicts with other things in our internal monorepo.  
Removing this requirement (we do not use the gcloud archiver internally) allows us to continue using these incompatible libraries elsewhere, unblocking us while we wait for them to upgrade.  
This may also be true for [waves in the direction of The Internet] all of You, so rejoice!  Our pain is your gain.  We now no longer force you to use these libraries.

We will likely be doing this with more submodules in the future, this is just the first proof-of-concept and establishes a pattern to follow for the others.  
The client library will very likely also be doing this (or possibly in other repos) to separate out some of its more problematic dependencies.

# How

I intend to write a more detailed doc soon, but as a tl;dr this "create a new submodule to separate optional dependencies" can be recreated for other future submodules by following this sequence:

## 1: make sure all imports "into" the to-be-a-submodule folder are reversed, and instead go "out" from the submodule to the "main" module.

E.g. PR #5597 converts a hardcoded switch statement into a registration system, and then this commit uses that to reverse the direction of the code's imports.

This is generally best done _up front_, because very nearly all of it can be done without any dependency changes, which leaves you with only a small window where your IDE does not quite work (while tidying the modules the first time).  
This will also let you make a separate commit with most or all of your code changes in isolation, which is easier to review and verify as non-changing.

Technically you can put this off until later, but you may have to modify your code while you have no working refactoring / autocomplete / navigation tools, and that's just unnecessary pain.

## 2: create a `go.mod` in the to-be-a-submodule folder

That must look something like:
```
module github.com/uber/cadence/some/folder

// don't depend on a released library, depend on the current code:
replace github.com/uber/cadence => ../../

// and the thrift replace, any other replaces that may be necessary in other `go.mod`s, etc
```

## 3: `go mod tidy` in the repo root / the main `go.mod`

And confirm it removes the dependencies that only the submodule uses.

In particular, this **MUST NOT** add the new submodule to the dependencies, or you've got a main -> submodule import somewhere.  Grep for it if so, it should be very easy to find, and reverse that dependency somehow.

## 4: `go mod tidy` in the new submodule.

If this pulls in incompatible versions, I've had good luck "seeding" the new `go.mod` file with the contents of the main `go.mod` file.  It seems to keep versions more stable than replaces + `go mod tidy` alone.  
(these incompatible versions are _probably problematic somehow_ and may cause issues in the future, but we can at least use this trick or `replace` as necessary to ignore them for now)

## 5: add your new module to the root `go.work` so gopls and some other CLI tools continue to work.

Plus any makefile / CI / etc changes necessary.  Hopefully you can just mimic existing multi-module stuff.

## And that's essentially it!

In summary:
- adjust your code
- make a new `go.mod`
- tidy everything
- adjust tooling as needed

Bundle as much of ^ this as you care into a single commit, and it should all work _as long as anyone using the submodule also pulls the same version of all other modules_ (because that's what you developed against, with `replace ... => ../`).  Only same-version will be checked in CI, so only same-version should be expected to work, though other combinations _may_ work in practice.

# How others can use these new submodules

Because we _require_ `replace` directives, a "simple" `go get github.com/uber/cadence[/or/submodules]` does not work and essentially never has and never will.  
That's fine, they just have to add our replaces in a custom `go.mod`:

```
module their/project

// replace thrift (and any other replacements at version X, check our go.mod)
replace github.com/apache/thrift => github.com/apache/thrift v0.0.0-20161221203622-b2a4d4ae21c7
```

And use a main.go to include the submodules:
```go
// optionally make it not buildable by default, makes some tools happier
//go:build just_for_gomod_purposes
package main
import (
  _ "github.com/uber/cadence/some/submodule"
  _ "github.com/uber/cadence/other/submodules/etc"
  _ "github.com/uber/cadence/cmd/server" // the main module, if you want it
)
func main() {} // does not need to be used / will not be run
```

And then `go get github.com/uber/cadence/{some/submodule,other/submodules/etc,cmd/server,}@version` to get all involved modules at the same version (as one command is most likely to work) which will merge all dependencies, update your `go.mod`, and you can now `go run github.com/uber/cadence/cmd/server` and it should work.  Or start using the various modules to make a new custom `main.go` or whatever is needed.

If necessary, you can also `replace` all `github.com/uber/cadence`-hosted modules with the same version / SHA, which should definitely force it to work.

Updating works similarly - `get` them all at the same version, make sure your replaces are up to date (as `go get` will not adjust those), and it should Just Work™ like it did with a single module.

---

This was originally planned to be landed over multiple commits to allow submodules to refer to the previously-merged commit(s) to set up dependencies, but relative `replace` directives simplify this a LOT, and bring a MUCH better development experience.

The main pros/cons of using a multi-commit + "refer to other-module@HEAD^ rather than relative paths" approach is:
- pro: people can `go get the/submodule@version` and they _could_ be guaranteed to get a working set of dependencies
    - ... at _some_ SHAs anyway.  in principle it could be e.g. at every released version or `@latest`, but it cannot be guaranteed at _all_ SHAs.
    - our thrift replace means this currently does not work anyway, and has not for a long time
- con: submodules can only refer to the _previous_ commit of other modules, as it must be a pushed SHA
    - this applies during development too, essentially requiring you to push a fork and `replace cadence => fork version` to do normal development
    - this "requires" multiple commits on master/somewhere permanent to truly release something stable.  or they can `go get` the same version of all modules.
- con: more complex dev and CI as things always refer to the _previous_ state at best, not the current
    - this can be worked around by manually `go get`ting / replacing in CI as needed to ensure the same versions, but that can become quite complex.

Filesystem-relative requires allow ^ all this in a single commit, immediately, without pushing, with the primary tradeoff being "_all_ submodule-users must manually ensure all repo-hosted modules are at the same version" rather than just _some_ (most of the time). 
 
That's pretty easy to automate, and `go get` doesn't work out-of-the-box anyway due to thrift, so we're not really losing anything by requiring that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants