Skip to content

RuleStore: implementation using thanos objstore bucket client #3805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 22, 2021

Conversation

jtlisi
Copy link
Contributor

@jtlisi jtlisi commented Feb 9, 2021

What this PR does:

  • Implement a rulestore client using the Thanos objstore client

Which issue(s) this PR fixes:
Related #2262

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@jtlisi jtlisi force-pushed the 20210202_objstore_rulestore branch from 5433e9b to 98998e0 Compare February 9, 2021 20:41
@pull-request-size pull-request-size bot added size/XL and removed size/L labels Feb 9, 2021
@jtlisi jtlisi force-pushed the 20210202_objstore_rulestore branch from 76ec1c6 to 0b73060 Compare February 10, 2021 18:55
@jtlisi jtlisi marked this pull request as ready for review February 11, 2021 01:59
@jtlisi jtlisi marked this pull request as draft February 11, 2021 02:00
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @jtlisi for working on this! I left few high level comments, before jumping on a deeper 2nd pass review. 🙏

return ErrUnsupportedStorageBackend
}

if cfg.Backend == bucket.S3 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't duplicate the bucket config Validate() but use it. I understanding you did it because the Backend validation wouldn't pass if it's configured configdb.

An option:

  1. Create bucket.Config.ValidateWithExtraBackends(extras []string)
  2. Pass []string{configdb} to ValidateWithExtraBackends()
  3. This function just calls cfg.Bucket.Validate()

func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
cfg.ConfigDB.RegisterFlagsWithPrefix(prefix, f)

cfg.S3.RegisterFlagsWithPrefix(prefix, f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should call cfg.Bucket.RegisterFlagsWithPrefix() instead of duplicating here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I didn't call this function is because it wouldn't print configdb as an option in the help text.

Between this issue and #3805 (comment) I think I found a solution that will accommodate the use case case.

  1. Add ExtraBackends []string field to the bucket.Config struct.
  2. In rulestore.Config.RegisterFlags we set the bucket.Config.ExtraBackends field to []string{ConfigDB}
  3. The bucket.Config.RegisterFlagsWithPrefix and bucket.Config.Validate functions are adjusted to merge the supportedBackends with the embedded ExtraBackends before validating/printing-help-text

Let me know what you think? I think this approach will avoid any additional functions at the cost of an additional hidden field in the config that must be set before the bucket.Config.Validate or bucket.Config.RegisterFlags functions are called.

rulePrefix = "rules" + delim
)

type BucketClient struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a bucket client but an implementation of the rule store. I think we should name it accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does BucketRuleStore work?

}
}

func (b *BucketClient) listNamespacesForUser(ctx context.Context, user string) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry. In all functions you will have to use UserBucketClient for every operation involving a specific user. We're leveraging on it to add extra functionalities and it's now required that we use it everywhere we leverage on the Thanos objstore client.

We can discuss offline if you need more info about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I transitioned all user operations to use a UserBucketClient. However, I had to make some additional changes to the UserBucketClient to allow for a global prefix, since all rules are currently stores on the path rules/<user-id>.

@jtlisi jtlisi force-pushed the 20210202_objstore_rulestore branch from 6216ec7 to f975f93 Compare February 12, 2021 02:59
@jtlisi jtlisi marked this pull request as ready for review February 12, 2021 03:55
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments from first quick look.

@jtlisi jtlisi force-pushed the 20210202_objstore_rulestore branch from 3b0e489 to aa58b75 Compare February 12, 2021 21:49
@jtlisi
Copy link
Contributor Author

jtlisi commented Feb 12, 2021

@pstibrany Thanks for the feedback, this should be ready for another look.

@jtlisi jtlisi requested a review from pstibrany February 16, 2021 04:10
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I've left some comments to take a look at.


// NewRuleStore creates a new bucket client based on the configured backend
func NewRuleStore(ctx context.Context, cfg Config, logger log.Logger, reg prometheus.Registerer) (rules.RuleStore, error) {
if cfg.Backend == ConfigDB {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need another way of configuring obsolete ConfigDB rule store? It's already possible to use it without having this option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this discussion we would eventually deprecate the config options currently configured using the -ruler.storage.* flag prefix in favor of the the newly added config field. However, the config service is still in use and needs to be carried over. I'm in favor of deprecating the config service but I think it would be best to scope that separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to deprecate configdb (although perhaps we should), just that we don't need a separate way of configuring it.

The way I was thinking about the transition was to add config options to existing RuleStoreConfig, and adding new types like "bucket-s3". But I see that it could create more confusion.

Ideally we will do this without breaking changes for ConfigDB users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I was thinking about the transition was to add config options to existing RuleStoreConfig, and adding new types like "bucket-s3". But I see that it could create more confusion.

I also felt it would be more confusing to try and fit this change into the existing config struct instead of creating a new top level struct. As for the configdb, the upgrade path would be identical to the bucket upgrade path, except only the prefix of the flags would change.

However, another option would be to:

  1. Not add the configdb as an option for the new RuleStore config struct
  2. When deprecating the existing objectclient flags/fields, maintain the config options for the configdb. However, give a clear warning that this client is not recommended for new deployments.
  3. Figure out the future of the configdb service separately

@jtlisi jtlisi force-pushed the 20210202_objstore_rulestore branch 2 times, most recently from 1d082c2 to 392d959 Compare February 17, 2021 16:39
@jtlisi jtlisi requested a review from pstibrany February 17, 2021 22:36
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good job!


// Used to inject additional backends into the config. Allows for this config to
// be embedded in multiple contexts and support non-object storage based backends.
ExtraBackends []string `yaml:"-"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite smart way of extending bucket.Config with more options.

rulePrefix = "rules"
)

type BucketRuleStore struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: comment this?

if err := ctx.Err(); err != nil {
return err
}
userBucket := bucket.NewUserBucketClient(userID, b.bucket)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only create userBucket once, not for every rule group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! 👏 ! 👏 Please mention it in the list of experimental features at docs/configuration/v1-guarantees.md.

CHANGELOG.md Outdated
@@ -7,6 +7,7 @@
* [CHANGE] Query-frontend: removed `-querier.split-queries-by-day` (deprecated in Cortex 0.4.0). You should use `-querier.split-queries-by-interval` instead. #3813
* [CHANGE] Store-gateway: the chunks pool controlled by `-blocks-storage.bucket-store.max-chunk-pool-bytes` is now shared across all tenants. #3830
* [CHANGE] Ingester: return error code 400 instead of 429 when per-user/per-tenant series/metadata limits are reached. #3833
* [FEATURE] Experimental Ruler Storage: Add a separate set of configuration options to configure the ruler storage backend under the `-ruler-storage.` flag prefix. All TSDB bucket clients and the config service are currently supported. Clients using this implementation will only be enabled if the existing `-ruler.storage` flags are left unset. #3805
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [FEATURE] Experimental Ruler Storage: Add a separate set of configuration options to configure the ruler storage backend under the `-ruler-storage.` flag prefix. All TSDB bucket clients and the config service are currently supported. Clients using this implementation will only be enabled if the existing `-ruler.storage` flags are left unset. #3805
* [FEATURE] Experimental Ruler Storage: Add a separate set of configuration options to configure the ruler storage backend under the `-ruler-storage.` flag prefix. All blocks storage bucket clients and the config service are currently supported. Clients using this implementation will only be enabled if the existing `-ruler.storage` flags are left unset. #3805

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -115,6 +116,7 @@ type Config struct {
TenantFederation tenantfederation.Config `yaml:"tenant_federation"`

Ruler ruler.Config `yaml:"ruler"`
RulerStorage rulestore.Config `yaml:"ruler_storage" doc:"hidden"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we wanna keep it hidden for now? I don't have strong opinions but, without having in it in the doc, will be difficult for our users to try it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, should be good to expose since we explicitly declare it experimental in the guarantees doc

const (
ConfigDB = "configdb"

name = "rule-store"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name = "rule-store"
name = "ruler-storage"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return nil, err
}

store := NewBucketRuleStore(bucketClient, 10, logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of putting a 10 here for loadConcurrency, why not having a const defined in pkg/ruler/rulestore/bucket_client.go and not letting to configure it at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think we should avoid exposing more config options for this client unless there is a compelling need to do so.


gr, err := b.getRuleGroup(gCtx, user, namespace, group, gr) // reuse group pointer from the map.
if err != nil {
level.Error(b.logger).Log("msg", "failed to get rule group", "namespace", namespace, "name", group, "user", user, "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We log in this case, but not other return error cases. What if we don't log any error here, but wrap the error with info about the namespace/group/user and let the caller log it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

fixup

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

fix flag

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

fix integration tests

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

pr-fix: rename rule-store flag/field to ruler-storage

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

pr-refactor: reuse bucket client Config validate and pass extra backend

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

pr-refactor: reuse bucket.Config function with extra backends and rename rulestore.BucketClient

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

update changelog

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

remove unused variables

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

make prefixed userBucket and utilize it in ruler storage

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

refactor: create PrefixedBucketClient and use it instead of altering UserBucketClient

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

refactor: mv rulestore to pkg/ruler

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

refactor: fix per PR suggestions, except embedding buckets

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

use prefixed bucket by default

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

add SupportModifications to implementation

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

add tests for DeleteNamespace

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

remove utility function and use embedded prefixed bucket

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

fix lint

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

add todo

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

add experimental comment to guarantees document

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
@jtlisi jtlisi force-pushed the 20210202_objstore_rulestore branch from 4426200 to 5f1dd3f Compare February 22, 2021 16:43
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
@pracucci
Copy link
Contributor

Fantastic job @jtlisi. LGTM! 🚀

@jtlisi jtlisi merged commit 621222c into cortexproject:master Feb 22, 2021
@jtlisi jtlisi deleted the 20210202_objstore_rulestore branch February 22, 2021 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants