Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configs for readonly and writeonly proxies. #76

Closed
wants to merge 2 commits into from

Conversation

uri-canva
Copy link

@uri-canva uri-canva commented Jan 30, 2019

Implements #69.

@uri-canva
Copy link
Author

This is configured via separate disable_reads and disable_writes fields in the yaml so that the default values match the current behavior of having both enabled. I could have used an enum as well but I didn't want to complicate the yaml parsing code, it might be worth it?

In the code it's represented as a Mode enum because that way the invalid configuration of disabling both cannot be represented at the type level.

Happy to change the config to be a mode enum in yaml as well if that is preferable.

@@ -26,6 +34,7 @@ type remoteHTTPProxyCache struct {
remote *http.Client
baseURL *url.URL
local cache.Cache
mode Mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe 'accessMode'?

@@ -92,6 +102,10 @@ func (r *remoteHTTPProxyCache) Put(kind cache.EntryKind, hash string, size int64
}
r.local.Put(kind, hash, size, data)

if r.mode&Write == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding spaces: r.mode & Write?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gofmt formats this without spaces.

@@ -14,10 +14,14 @@ type GoogleCloudStorageConfig struct {
Bucket string `yaml:"bucket"`
UseDefaultCredentials bool `yaml:"use_default_credentials"`
JSONCredentialsFile string `yaml:"json_credentials_file"`
DisableReads bool `yaml:"disable_reads"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, what is the use case for disable_reads?

Copy link
Author

@uri-canva uri-canva Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write through from a cache that is less likely to get poisoned to one that is more likely to, and being able to shared cached artefacts without blocking any builds.

For example if your CI cache is in a data center and your developer cache is in the office, the link between them might not be as fast as the link between developers and the developer cache. In that case waiting to download the artefact from the data center cache might be generally slower than building locally. But if you set up write through the data center cache can slowly send artefacts to the developer cache without blocking any builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! If the network is slow and is blocking the Dev box's build, understood we want to disable read. However, we may also want to disable write too as we may not want upload cache from Dev box? In this case 'disable_reads' and 'disable_writes' should be used at the same time, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Developer machine <===> developer cache <---> CI cache
                    ^                     ^
            fast connection        slow connection

Developer machines can quickly read and write to developer cache, developer cache can slowly write to CI cache without blocking developers because uploads are asynchronous, but it cannot slowly read from CI cache without blocking developers because downloads are synchronous.

@ob
Copy link
Contributor

ob commented Feb 4, 2019

One thing that trips me is the negated logic of disable_reads: true, I tend to prefer reads_enabled: true. But I'll let @buchgr's opinion there trump mine. The code looks fine to me.

@uri-canva
Copy link
Author

ping @buchgr

@ob
Copy link
Contributor

ob commented Feb 21, 2019

@buchgr is OOO until Feb 25

@uri-canva
Copy link
Author

ping @buchgr

1 similar comment
@uri-canva
Copy link
Author

ping @buchgr

@uri-canva uri-canva closed this Jun 24, 2019
@uri-canva uri-canva deleted the read-only-proxy branch June 24, 2019 08:46
@uri-canva uri-canva mentioned this pull request Feb 1, 2020
@brentleyjones
Copy link

We could really use this feature.

@uri-canva would you reopen this if @buchgr would accept it?

How about it @buchgr 🙏?

@uri-canva
Copy link
Author

We're not using this patch anymore ourselves, as it didn't fit our purpose as well as I thought, you're welcome to base a new PR on it though.

@mostynb
Copy link
Collaborator

mostynb commented Feb 2, 2020

Since there are a few interrelated issues/PRs on this topic, I want to make sure that I understand the use case...

Is this the scenario:

  • There's a central CI system with its own cache. This is trustworthy because the CI setup is locked down.
  • There is a developer-local cache, which is more easily poisoned.
  • The link between the central and developer-local cache is slow enough that you don't want synchronous lookups between them.
  • You want the central cache to asynchronously populate the developer-local cache, but not to read from it.
  • (Not sure if this was requested, but do you potentially want multiple developer-local caches to work like this, with a single central cache?)

@brentleyjones
Copy link

brentleyjones commented Feb 2, 2020

I’ll let @uri-canva answer for their situation, but for us it is much simpler, we only really need the writes_disabled setting:

  • Central CI cache. Written to by CI, since it’s trusted to not poison it.
  • Local developer cache on each machine. Used since disk_cache doesn’t clean up, and implementing naive cleanup in a non-AC aware way breaks Builds without the Bytes.
  • Local developer cache points it’s proxy to CI cache with writes_disabled to get remote cache benefits without possibility of poisoning it.

Essentially, if disk_cache could set a size limit and clean up in an LRU way, we wouldn’t need this solution at all. This allows bazel-remote to fill that purpose instead.

@uri-canva
Copy link
Author

Disregard the note around disable_reads above, we have it set up differently now because it wasn't working that well anyway. We were using disable_writes exactly the same way as @brentleyjones: to use bazel-remote as a local disk cache, given the discussion in bazelbuild/bazel#5139.

Essentially, if disk_cache could set a size limit and clean up in an LRU way, we wouldn’t need this solution at all. This allows bazel-remote to fill that purpose instead.

We use it to fall back to reading from a remote cache too, which Bazel doesn't do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants