Skip to content

Convert Cortex components to services #2166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 89 commits into from
Mar 4, 2020
Merged

Convert Cortex components to services #2166

merged 89 commits into from
Mar 4, 2020

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Feb 20, 2020

This PR is a supplement to proposal document to revamp service model used by Cortex.

The document and this PR are based on the concept of "Service", as used in Google Guava library: Service is an object with operational state, with methods to start/stop the service, well-defined and observable state, methods for waiting for state transitions. Google Guava is a Java library, here we use reimplementation of Services concept in Go. Update: now part of Cortex, as of PR #2188.

In this PR, Cortex components like ingester, ruler, ring or lifecycler are converted to such services.
During startup Cortex collects all the services from modules that are supposed to run (based on target module), and then starts them in parallel. Module can either contribute a service directly, or use so-called "wrapped" service – wrapped services will make sure that services are still started and stopped in a correct order based on module dependencies.

Important change in module initialization code is that services are returned in "New" state – they are not supposed to start any background tasks until started. That means that initialization is faster.

Main Cortex Run method now simply starts all services, and then waits until "Server" service stops or any service fails, and then initiates shutdown. "Server" service is special because it reacts on SIGTERM signal (this is how Cortex always worked). It also runs HTTP and gRPC servers.

Not all benefits of services model are implemented, but some are: for example Blocks-based StoreQueryable now fetches indexes in the background, while other services start (most importantly "Server" itself with HTTP server, for collecting metrics). Until StoreQueryable is done, Cortex doesn't transition to "Started" phase, but since HTTP server already runs, we can collect its metrics.

WAL replays in ingester are not done in background in this PR... while it would be trivial to do this change, ingester would need more changes to check if it's running already or not, and it seemed like out of scope for this PR.

Another possible area for cleanups is Lifecycler -- now it has a way to return errors, instead of just doing os.Exit. We should use that.

Please familiarize yourself with services concept, Go implementation (Service interface, BasicService implementation and ServiceManager. This PR touches many places, but "conversion" to services was pretty straightforward.

This is an alternative to PR #2119.

Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say overall you did a very good job @pstibrany!
I did a first pass through the new repo and how it is used in Cortex. I have added some comments about design decisions. Some of the things that I pointed out are subjective so its not necessary that you follow them all.

While the services repo looks solid I was thinking while we are at it maybe we should offload dependency management to the new package like starting and stopping the dependencies in order. Cortex or any other project using services repo would register a bunch of services as ordered dependencies with each service and that code should take care of it. What do you think?

Comment on lines 15 to 25
type StartingFn func(serviceContext context.Context) error

// RunningFn function is called when service enters Running state. When it returns, service will move to Stopping state.
// If RunningFn or Stopping return error, Service will end in Failed state, otherwise if both functions return without
// error, service will end in Terminated state.
type RunningFn func(serviceContext context.Context) error

// StoppingFn function is called when service enters Stopping state. When it returns, service moves to Terminated or Failed state,
// depending on whether there was any error returned from previous RunningFn (if it was called) and this StoppingFn function. If both return error,
// RunningFn's error will be saved as failure case for Failed state.
type StoppingFn func() error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is just me but I would prefer these functions to be in an interface? For each service type, there can be a different interface. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, there is one Service interface, and currently one BasicService implementation.

These functions are specific to the implementation. BasicService requires all of them, but not all services need to use them all. These functions are not related to different types of services, single service often uses all three.

Copy link
Contributor

@sandeepsukhani sandeepsukhani Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right. I was suggesting to define an interface for each service type with the methods that the caller needs to define to create a service like for InitIdleService function make an interface as below:

type IdleService interface {
	type StartingFn func(serviceContext context.Context) error
	type StoppingFn func() error
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But even "idle" service may not require starting or stopping function, and using interface would force it to have one (which does nothing).

Furthermore, by using interface, one would need to have type implementing the interface, but that's often not practical. For example, initQuerier now returns idle service without needing any extra type: https://github.com/pstibrany/cortex/blob/5c9d7faad7121d982ed16cd11a601d693e258b99/pkg/cortex/modules.go#L323

And if for example Ingester implemented this interface directly, those StartingFn and StoppingFn would become part of its public API, which is what I want to avoid.

// if our context has been canceled in the meantime.
if err = b.serviceContext.Err(); err != nil {
err = nil // don't report this as a failure, it is a normal "stop" signal.
goto stop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of goto here by adding an else block or maybe move the code from goto block to some other function and call it maybe cleanupService() which is then called in a defer before b.serviceContext.Err(). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I had service lifecycle implemented in several methods, but decided to keep it in one for simplicity. It think it is easier to follow the logic this way, and one goto when starting succeeds, but context is canceled so we skip Running, is fine.

Alternatively, we can remove this completely, and just go to Running, which should then find out that it needs to stop. Skipping Running state is my go-specific "invention", and I'm not 100% it is the correct one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that else would work too

@pstibrany
Copy link
Contributor Author

pstibrany commented Feb 21, 2020

Thank you for your review Sandeep!

While the services repo looks solid I was thinking while we are at it maybe we should offload dependency management to the new package like starting and stopping the dependencies in order. Cortex or any other project using services repo would register a bunch of services as ordered dependencies with each service and that code should take care of it. What do you think?

In general I'd like to keep the package as small as possible and close to Guava, to ease adoption for people who know it already. Especially Service interface, service lifecycle, ServicesManager and listeners are basically set to stone. But this can be a useful extension to include, if implemented well.

In Cortex, modules that don't share dependencies already start concurrently (this is new in this PR), so basically modules on each line start at the same time (assuming they all would start):

  • Server, RuntimeConfig and MemberlistKV
  • Ring, Overrides, TableManager, Configs, AlertManager, Compactor
  • Distributor, Store, QueryFrontend
  • StoreQueryable
  • Ingester, Ruler

This may be tricky to express nicely in a general way, to be included in general package, but perhaps.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @pstibrany! I can't say this PR is easy to review: it's pretty large and unfortunately a good candidate for rebasing pain in the future and took me about 3h to review. As a retrospective, we could have tried to work on a smaller initial step (ie. progressively adding Service to components first). Let's see if we can get this merged in a reasonable time.

Good job on code comments. They helped me a lot understanding the state and thus reviewing the code 👏

Question: what's about the /ready probe? Is it mandatory to have one configured for each Cortex component once this PR will be merged, otherwise we may have some endpoints hit ahead of time (ie. querier)?

@pstibrany
Copy link
Contributor Author

Good job @pstibrany! I can't say this PR is easy to review: it's pretty large and unfortunately a good candidate for rebasing pain in the future and took me about 3h to review. As a retrospective, we could have tried to work on a smaller initial step (ie. progressively adding Service to components first). Let's see if we can get this merged in a reasonable time.

I have tried that. And this is the result :-( I realize it's not small at all, but doing it for single module only didn't make much sense to me.

I will split this PR into two, one to introduce services package to Cortex, and one with the rest of the work.

Question: what's about the /ready probe? Is it mandatory to have one configured for each Cortex component once this PR will be merged, otherwise we may have some endpoints hit ahead of time (ie. querier)?

Querier already has /ready probe, but it doesn't react on services status, it just always returns 200. This is definitely something to add! I would probably go as far as checking the state of entire Cortex service manager, and report Ready only if all services are Running. In case of microservices, it would reflect the main service, in case of single binary, it's the state of entire Cortex binary. What do you think?

@pracucci
Copy link
Contributor

Querier already has /ready probe, but it doesn't react on services status, it just always returns 200. This is definitely something to add! I would probably go as far as checking the state of entire Cortex service manager, and report Ready only if all services are Running. In case of microservices, it would reflect the main service, in case of single binary, it's the state of entire Cortex binary. What do you think?

I don't see any other way to do it. In single binary mode you can't have a per-internal-service readiness probe, so /ready should return 200 only after all services are ready.

@pstibrany
Copy link
Contributor Author

pstibrany commented Feb 28, 2020

I think this is now ready for another review.

Here is list of follow-up work when this PR is merged

  • /services should be nicer, and documented in docs/apis.md
  • /services should include sub-services in the output, see status.go (I'm thinking via new interface which would give access to sub-services in a generic way, but services then need to make sure they return correct results in race-free way)
  • querier.New should implement a service, so that other components can wait for it to be ready. Also storeQueryable should be a service, and querier should then reflect on it.
  • ruler should wait until querier is running, before doing anything
  • Ingester: move more stuff into Starting (WAL replays). Requires that Ingester checks its running state on pushes and queries.
  • Ingester v2: move opening of TSDBs into Starting
  • Ingester WAL - convert to service, manage via subservices
  • cleanup Lifecycler, return errors from methods instead of calling os.Exit

@pracucci
Copy link
Contributor

Fixes #784

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pstibrany for addressing my feedback! ❤️

As discussed offline, in this PR we should also:

  • mention configuring /ready probe for ALL services
  • mention breaking changes when vendoring
  • check querier changes in current PR and see if we return correct errors

Could you also rebase master, please? I would like to see the new querier integration tests running too.

@pstibrany
Copy link
Contributor Author

  • mention configuring /ready probe for ALL services

Done

  • mention breaking changes when vendoring

Done

  • check querier changes in current PR and see if we return correct errors

I've modified code so that it returns 500 instead of 422 (indicates client error) from API when server is not yet ready:

$ curl -v http://localhost:8000/api/prom/api/v1/query?query=up 
> GET /api/prom/api/v1/query?query=up HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/json
< Date: Fri, 28 Feb 2020 18:25:15 GMT
< Content-Length: 91
< 

{"status":"error","errorType":"internal","error":"BlockQueryable is not running: Starting"}

We could also wrap API handler with a check if server is ready and return 503 if it isn't. We may want to do that in a separate PR.

Could you also rebase master, please? I would like to see the new querier integration tests running too.

Done.

Thanks for your review!

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pstibrany for this amazing work! LGTM!

@@ -110,8 +110,7 @@ func main() {
}

runtime.KeepAlive(ballast)
err = t.Stop()
util.CheckFatal("initializing cortex", err)
util.CheckFatal("stopping cortex", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not checking any new error (rather the err from t, err := cortex.New(cfg)). If you intended to check for error from t.Run(), you need to have err = t.Run() outside of the if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I've now removed this line completely. Everything now happens in Run anyway.

sandeepsukhani added a commit to grafana/cortex that referenced this pull request Mar 3, 2020
Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Peter! LGTM!

Though this touches a lot of things and its hard to get things right. But now that we are running it in our dev and staging environments successfully, I think this is ready to go!

pstibrany added 13 commits March 3, 2020 18:32
…nverted.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
pstibrany added 19 commits March 3, 2020 18:33
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
It also uses ingester's check (if configured) to limit rate of starting
ingesters.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
…)` calls.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
…op().

We can use Stop in the test instead.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
That makes API handler to return HTTP code 500 (server error)
instead of 422 (client error).

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@michaeljoy
Copy link

Just curious, why not implement the basic health checks that Prometheus does along with the vast majority of exporters for simplicity and consistency? Found myself deep diving the source code to try and reverse engineer a health check as none was readily available, nor documented, and it didn't follow the prometheus or kube conventions (best not to reinvent the wheel these days for something as simple as a health check).

Prometheus provides a set of management API to ease automation and integrations.

Health check
GET /-/healthy
This endpoint always returns 200 and should be used to check Prometheus health.

Readiness check
GET /-/ready
This endpoint returns 200 when Prometheus is ready to serve traffic (i.e. respond to queries).

@pracucci
Copy link
Contributor

Readiness check
GET /-/ready
This endpoint returns 200 when Prometheus is ready to serve traffic (i.e. respond to queries).

I think this is a very good feedback. We already had /ready for ingesters so we kept compatibility, but nothing prevent us from adding support to /-/ready. @michaeljoy are you willing to submit a PR? Should be easy to add the support (see where we register /ready in pkg/cortex/cortex.go).

@pstibrany
Copy link
Contributor Author

I think we can also add /healthy (as server with only New/Starting/Running/Stopping services, but no Failed or Terminated ones). What do you think?

Other than being compatible with Prometheus (which we aren’t in many other ways), what’s the point of having both /-/ready and /ready?

@michaeljoy
Copy link

michaeljoy commented Mar 13, 2020

I agree having both is mostly pointless. Having 1 endpoint that returns a 200 is sufficient. Typically you don't want these to be heavyweight checks and if they are, you cache the results for some time anyways to prevent DOS'ing the more costly layer checks. I think the only reason for having both is if you are exposing some heavyweight check that you don't want your load balancer hitting and or exposing externally. Which in this case is needless duplication I agree.

The primary purpose in this case is to give the container scheduler (ECS, Fargate, et al) the ability to terminate a container that is no longer able to serve requests, and only consider swapping containers once a container is ready to service requests successfully. Also, to expose it in a way that doesn't require prior knowledge of the API to implement at a systems level.

simonswine pushed a commit to grafana/e2e that referenced this pull request Jan 13, 2022
* First step towards Cortex using services model. Only Server module converted.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted runtimeconfig.Manager to service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added /services page.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted memberlistKV to service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed runtimeconfig tests.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted Ring to service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Log waiting for other module to initialize.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted overrides to service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted client pool to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert ring lifecycler into a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted HA Tracker to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted Distributor to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Handle nil from wrappedService call.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Explain why Server uses service and not a wrappedService.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Make service from Store.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert ingester to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert querier initialization into a service. This isn't full
conversion, but more work would be required to fully use services
for querier. Left TODO comment instead.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Listen for module failures, and log them.

Also log when all services start successfully.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert blockQueryable and UserStore into a services.

UserStore now does initial sync in Starting state.
blockQueryable doesn't return querier until it has finished starting.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Wait a little before shutdown.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted queryFrontend to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Logging

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert TableManager to service

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert Ruler to service

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert Configs to service

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert AlertManager to service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Renamed init methods back.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed tests.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted Compactor to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Polishing, comments.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comments.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Lint comments.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Stop server only after all other modules have finished.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Don't send more jobs to lifecycler loop, if it's not running anymore.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Don't stop Server until other modules have stopped.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed Compactor from All target. It was meant to be for testing only.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* More comments around startup logic.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* moduleServiceWrapper doesn't need full Cortex, only serviceMap

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Messages

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix outdated comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Start lifecycler in starting functions.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed comment. Return lifecycler's failure case, if any, as error.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix lifecycler usage.

Pass independent context to lifecycler, so that it doesn't stop too early.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix test.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed obsolete waiting code. Only log error if it is real error.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Renamed servManager to subservices.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Addressing review feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix test.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix compilation errors after rebase.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Extracted code that creates server service into separate file.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added some helper methods.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use helper methods to simplify service creation.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Helper functions for manager.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use helper functions.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixes, use helper functions.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixes, use helper functions.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* comment

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Helper function

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use helper functions to reduce amount of code.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added tests for helper functions.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed imports

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Simplify code.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Stop and wait until stopped.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Imports

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Manage compaction and shipper via subservices manager.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Improve error message.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added unit test for Cortex initialization and module dependencies.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comments, return errors.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Unified /ready handlers into one, that reflects state of all services.

It also uses ingester's check (if configured) to limit rate of starting
ingesters.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added //nolint:errcheck to `defer services.StopAndAwaitTerminated(...)` calls.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix http handler logic. Also renamed it.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Address review feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* One more test... since Shutdown already stops Run, no need to call Stop().

We can use Stop in the test instead.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed integration test, old versions of Cortex didn't have /ready probe.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Make lint happy.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Mention /ready for all services.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Wrap "not running" error into promql.ErrStorage.

That makes API handler to return HTTP code 500 (server error)
instead of 422 (client error).

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Expose http port via method on HTTPService.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Print number of services in each state.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix comment and remove obsolete line.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix compile errors after rebase.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Rebased and converted data purger to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Pass context to the bucket client.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants