Convert Cortex components to services #2166

pstibrany · 2020-02-20T11:29:12Z

This PR is a supplement to proposal document to revamp service model used by Cortex.

The document and this PR are based on the concept of "Service", as used in Google Guava library: Service is an object with operational state, with methods to start/stop the service, well-defined and observable state, methods for waiting for state transitions. Google Guava is a Java library, here we use reimplementation of Services concept in Go. Update: now part of Cortex, as of PR #2188.

In this PR, Cortex components like ingester, ruler, ring or lifecycler are converted to such services.
During startup Cortex collects all the services from modules that are supposed to run (based on target module), and then starts them in parallel. Module can either contribute a service directly, or use so-called "wrapped" service – wrapped services will make sure that services are still started and stopped in a correct order based on module dependencies.

Important change in module initialization code is that services are returned in "New" state – they are not supposed to start any background tasks until started. That means that initialization is faster.

Main Cortex Run method now simply starts all services, and then waits until "Server" service stops or any service fails, and then initiates shutdown. "Server" service is special because it reacts on SIGTERM signal (this is how Cortex always worked). It also runs HTTP and gRPC servers.

Not all benefits of services model are implemented, but some are: for example Blocks-based StoreQueryable now fetches indexes in the background, while other services start (most importantly "Server" itself with HTTP server, for collecting metrics). Until StoreQueryable is done, Cortex doesn't transition to "Started" phase, but since HTTP server already runs, we can collect its metrics.

WAL replays in ingester are not done in background in this PR... while it would be trivial to do this change, ingester would need more changes to check if it's running already or not, and it seemed like out of scope for this PR.

Another possible area for cleanups is Lifecycler -- now it has a way to return errors, instead of just doing os.Exit. We should use that.

Please familiarize yourself with services concept, Go implementation (Service interface, BasicService implementation and ServiceManager. This PR touches many places, but "conversion" to services was pretty straightforward.

This is an alternative to PR #2119.

sandeepsukhani

I would say overall you did a very good job @pstibrany!
I did a first pass through the new repo and how it is used in Cortex. I have added some comments about design decisions. Some of the things that I pointed out are subjective so its not necessary that you follow them all.

While the services repo looks solid I was thinking while we are at it maybe we should offload dependency management to the new package like starting and stopping the dependencies in order. Cortex or any other project using services repo would register a bunch of services as ordered dependencies with each service and that code should take care of it. What do you think?

sandeepsukhani · 2020-02-21T09:01:19Z

vendor/github.com/pstibrany/services/basic_service.go

+type StartingFn func(serviceContext context.Context) error
+
+// RunningFn function is called when service enters Running state. When it returns, service will move to Stopping state.
+// If RunningFn or Stopping return error, Service will end in Failed state, otherwise if both functions return without
+// error, service will end in Terminated state.
+type RunningFn func(serviceContext context.Context) error
+
+// StoppingFn function is called when service enters Stopping state. When it returns, service moves to Terminated or Failed state,
+// depending on whether there was any error returned from previous RunningFn (if it was called) and this StoppingFn function. If both return error,
+// RunningFn's error will be saved as failure case for Failed state.
+type StoppingFn func() error


Maybe it is just me but I would prefer these functions to be in an interface? For each service type, there can be a different interface. What do you think?

In general, there is one Service interface, and currently one BasicService implementation.

These functions are specific to the implementation. BasicService requires all of them, but not all services need to use them all. These functions are not related to different types of services, single service often uses all three.

Yes, right. I was suggesting to define an interface for each service type with the methods that the caller needs to define to create a service like for InitIdleService function make an interface as below:

type IdleService interface { type StartingFn func(serviceContext context.Context) error type StoppingFn func() error }

But even "idle" service may not require starting or stopping function, and using interface would force it to have one (which does nothing).

Furthermore, by using interface, one would need to have type implementing the interface, but that's often not practical. For example, initQuerier now returns idle service without needing any extra type: https://github.com/pstibrany/cortex/blob/5c9d7faad7121d982ed16cd11a601d693e258b99/pkg/cortex/modules.go#L323

And if for example Ingester implemented this interface directly, those StartingFn and StoppingFn would become part of its public API, which is what I want to avoid.

vendor/github.com/pstibrany/services/basic_service.go

sandeepsukhani · 2020-02-21T09:47:29Z

vendor/github.com/pstibrany/services/basic_service.go

+	// if our context has been canceled in the meantime.
+	if err = b.serviceContext.Err(); err != nil {
+		err = nil // don't report this as a failure, it is a normal "stop" signal.
+		goto stop


I think we can get rid of goto here by adding an else block or maybe move the code from goto block to some other function and call it maybe cleanupService() which is then called in a defer before b.serviceContext.Err(). What do you think?

Originally I had service lifecycle implemented in several methods, but decided to keep it in one for simplicity. It think it is easier to follow the logic this way, and one goto when starting succeeds, but context is canceled so we skip Running, is fine.

Alternatively, we can remove this completely, and just go to Running, which should then find out that it needs to stop. Skipping Running state is my go-specific "invention", and I'm not 100% it is the correct one.

You're right that else would work too

pstibrany · 2020-02-21T11:23:46Z

Thank you for your review Sandeep!

While the services repo looks solid I was thinking while we are at it maybe we should offload dependency management to the new package like starting and stopping the dependencies in order. Cortex or any other project using services repo would register a bunch of services as ordered dependencies with each service and that code should take care of it. What do you think?

In general I'd like to keep the package as small as possible and close to Guava, to ease adoption for people who know it already. Especially Service interface, service lifecycle, ServicesManager and listeners are basically set to stone. But this can be a useful extension to include, if implemented well.

In Cortex, modules that don't share dependencies already start concurrently (this is new in this PR), so basically modules on each line start at the same time (assuming they all would start):

Server, RuntimeConfig and MemberlistKV
Ring, Overrides, TableManager, Configs, AlertManager, Compactor
Distributor, Store, QueryFrontend
StoreQueryable
Ingester, Ruler

This may be tricky to express nicely in a general way, to be included in general package, but perhaps.

pracucci

Good job @pstibrany! I can't say this PR is easy to review: it's pretty large and unfortunately a good candidate for rebasing pain in the future and took me about 3h to review. As a retrospective, we could have tried to work on a smaller initial step (ie. progressively adding Service to components first). Let's see if we can get this merged in a reasonable time.

Good job on code comments. They helped me a lot understanding the state and thus reviewing the code 👏

Question: what's about the /ready probe? Is it mandatory to have one configured for each Cortex component once this PR will be merged, otherwise we may have some endpoints hit ahead of time (ie. querier)?

pkg/cortex/modules.go

pkg/ring/lifecycler.go

pkg/cortex/cortex.go

pstibrany · 2020-02-26T08:39:37Z

Good job @pstibrany! I can't say this PR is easy to review: it's pretty large and unfortunately a good candidate for rebasing pain in the future and took me about 3h to review. As a retrospective, we could have tried to work on a smaller initial step (ie. progressively adding Service to components first). Let's see if we can get this merged in a reasonable time.

I have tried that. And this is the result :-( I realize it's not small at all, but doing it for single module only didn't make much sense to me.

I will split this PR into two, one to introduce services package to Cortex, and one with the rest of the work.

Question: what's about the /ready probe? Is it mandatory to have one configured for each Cortex component once this PR will be merged, otherwise we may have some endpoints hit ahead of time (ie. querier)?

Querier already has /ready probe, but it doesn't react on services status, it just always returns 200. This is definitely something to add! I would probably go as far as checking the state of entire Cortex service manager, and report Ready only if all services are Running. In case of microservices, it would reflect the main service, in case of single binary, it's the state of entire Cortex binary. What do you think?

pracucci · 2020-02-26T08:46:54Z

Querier already has /ready probe, but it doesn't react on services status, it just always returns 200. This is definitely something to add! I would probably go as far as checking the state of entire Cortex service manager, and report Ready only if all services are Running. In case of microservices, it would reflect the main service, in case of single binary, it's the state of entire Cortex binary. What do you think?

I don't see any other way to do it. In single binary mode you can't have a per-internal-service readiness probe, so /ready should return 200 only after all services are ready.

pstibrany · 2020-02-28T09:50:54Z

I think this is now ready for another review.

Here is list of follow-up work when this PR is merged

/services should be nicer, and documented in docs/apis.md
/services should include sub-services in the output, see status.go (I'm thinking via new interface which would give access to sub-services in a generic way, but services then need to make sure they return correct results in race-free way)
querier.New should implement a service, so that other components can wait for it to be ready. Also storeQueryable should be a service, and querier should then reflect on it.
ruler should wait until querier is running, before doing anything
Ingester: move more stuff into Starting (WAL replays). Requires that Ingester checks its running state on pushes and queries.
Ingester v2: move opening of TSDBs into Starting
Ingester WAL - convert to service, manage via subservices
cleanup Lifecycler, return errors from methods instead of calling os.Exit

pracucci · 2020-02-28T14:11:44Z

Fixes #784

pracucci

Thanks @pstibrany for addressing my feedback! ❤️

As discussed offline, in this PR we should also:

mention configuring /ready probe for ALL services
mention breaking changes when vendoring
check querier changes in current PR and see if we return correct errors

Could you also rebase master, please? I would like to see the new querier integration tests running too.

pkg/compactor/compactor.go

pkg/cortex/cortex.go

pkg/cortex/server_service.go

pstibrany · 2020-02-28T18:28:23Z

mention configuring /ready probe for ALL services

Done

mention breaking changes when vendoring

Done

check querier changes in current PR and see if we return correct errors

I've modified code so that it returns 500 instead of 422 (indicates client error) from API when server is not yet ready:

$ curl -v http://localhost:8000/api/prom/api/v1/query?query=up 
> GET /api/prom/api/v1/query?query=up HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/json
< Date: Fri, 28 Feb 2020 18:25:15 GMT
< Content-Length: 91
< 

{"status":"error","errorType":"internal","error":"BlockQueryable is not running: Starting"}

We could also wrap API handler with a check if server is ready and return 503 if it isn't. We may want to do that in a separate PR.

Could you also rebase master, please? I would like to see the new querier integration tests running too.

Done.

Thanks for your review!

CHANGELOG.md

pracucci

Thanks @pstibrany for this amazing work! LGTM!

integration/backward_compatibility_test.go

codesome · 2020-03-02T15:27:42Z

cmd/cortex/main.go

@@ -110,8 +110,7 @@ func main() {
 	}

 	runtime.KeepAlive(ballast)
-	err = t.Stop()
-	util.CheckFatal("initializing cortex", err)
+	util.CheckFatal("stopping cortex", err)


This is not checking any new error (rather the err from t, err := cortex.New(cfg)). If you intended to check for error from t.Run(), you need to have err = t.Run() outside of the if.

You are correct. I've now removed this line completely. Everything now happens in Run anyway.

gouthamve

Nice work Peter! LGTM!

Though this touches a lot of things and its hard to get things right. But now that we are running it in our dev and staging environments successfully, I think this is ready to go!

pkg/chunk/table_manager.go

…nverted. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

It also uses ingester's check (if configured) to limit rate of starting ingesters. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

…)` calls. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

…op(). We can use Stop in the test instead. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

That makes API handler to return HTTP code 500 (server error) instead of 422 (client error). Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

michaeljoy · 2020-03-09T20:56:41Z

Just curious, why not implement the basic health checks that Prometheus does along with the vast majority of exporters for simplicity and consistency? Found myself deep diving the source code to try and reverse engineer a health check as none was readily available, nor documented, and it didn't follow the prometheus or kube conventions (best not to reinvent the wheel these days for something as simple as a health check).

Prometheus provides a set of management API to ease automation and integrations.

Health check
GET /-/healthy
This endpoint always returns 200 and should be used to check Prometheus health.

Readiness check
GET /-/ready
This endpoint returns 200 when Prometheus is ready to serve traffic (i.e. respond to queries).

pracucci · 2020-03-10T16:50:39Z

Readiness check
GET /-/ready
This endpoint returns 200 when Prometheus is ready to serve traffic (i.e. respond to queries).

I think this is a very good feedback. We already had /ready for ingesters so we kept compatibility, but nothing prevent us from adding support to /-/ready. @michaeljoy are you willing to submit a PR? Should be easy to add the support (see where we register /ready in pkg/cortex/cortex.go).

pstibrany · 2020-03-10T17:09:51Z

I think we can also add /healthy (as server with only New/Starting/Running/Stopping services, but no Failed or Terminated ones). What do you think?

Other than being compatible with Prometheus (which we aren’t in many other ways), what’s the point of having both /-/ready and /ready?

michaeljoy · 2020-03-13T18:27:37Z

I agree having both is mostly pointless. Having 1 endpoint that returns a 200 is sufficient. Typically you don't want these to be heavyweight checks and if they are, you cache the results for some time anyways to prevent DOS'ing the more costly layer checks. I think the only reason for having both is if you are exposing some heavyweight check that you don't want your load balancer hitting and or exposing externally. Which in this case is needless duplication I agree.

The primary purpose in this case is to give the container scheduler (ECS, Fargate, et al) the ability to terminate a container that is no longer able to serve requests, and only consider swapping containers once a container is ready to service requests successfully. Also, to expose it in a way that doesn't require prior knowledge of the API to implement at a systems level.

* First step towards Cortex using services model. Only Server module converted. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted runtimeconfig.Manager to service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added /services page. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted memberlistKV to service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed runtimeconfig tests. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted Ring to service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Log waiting for other module to initialize. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted overrides to service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted client pool to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert ring lifecycler into a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted HA Tracker to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted Distributor to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Handle nil from wrappedService call. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Explain why Server uses service and not a wrappedService. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Make service from Store. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert ingester to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert querier initialization into a service. This isn't full conversion, but more work would be required to fully use services for querier. Left TODO comment instead. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Listen for module failures, and log them. Also log when all services start successfully. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert blockQueryable and UserStore into a services. UserStore now does initial sync in Starting state. blockQueryable doesn't return querier until it has finished starting. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Wait a little before shutdown. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted queryFrontend to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Logging Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert TableManager to service Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert Ruler to service Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert Configs to service Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert AlertManager to service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Renamed init methods back. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed tests. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted Compactor to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Polishing, comments. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comments. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Lint comments. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Stop server only after all other modules have finished. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't send more jobs to lifecycler loop, if it's not running anymore. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't stop Server until other modules have stopped. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed Compactor from All target. It was meant to be for testing only. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * More comments around startup logic. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * moduleServiceWrapper doesn't need full Cortex, only serviceMap Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Messages Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix outdated comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Start lifecycler in starting functions. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed comment. Return lifecycler's failure case, if any, as error. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix lifecycler usage. Pass independent context to lifecycler, so that it doesn't stop too early. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix test. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed obsolete waiting code. Only log error if it is real error. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Renamed servManager to subservices. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Addressing review feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix test. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix compilation errors after rebase. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Extracted code that creates server service into separate file. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added some helper methods. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use helper methods to simplify service creation. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Helper functions for manager. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use helper functions. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixes, use helper functions. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixes, use helper functions. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * comment Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Helper function Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use helper functions to reduce amount of code. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added tests for helper functions. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed imports Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Simplify code. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Stop and wait until stopped. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Imports Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Manage compaction and shipper via subservices manager. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve error message. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added unit test for Cortex initialization and module dependencies. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comments, return errors. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Unified /ready handlers into one, that reflects state of all services. It also uses ingester's check (if configured) to limit rate of starting ingesters. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added //nolint:errcheck to `defer services.StopAndAwaitTerminated(...)` calls. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix http handler logic. Also renamed it. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Address review feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * One more test... since Shutdown already stops Run, no need to call Stop(). We can use Stop in the test instead. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed integration test, old versions of Cortex didn't have /ready probe. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Make lint happy. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Mention /ready for all services. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Wrap "not running" error into promql.ErrStorage. That makes API handler to return HTTP code 500 (server error) instead of 422 (client error). Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Expose http port via method on HTTPService. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Print number of services in each state. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix comment and remove obsolete line. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix compile errors after rebase. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Rebased and converted data purger to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Pass context to the bucket client. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

pstibrany mentioned this pull request Feb 20, 2020

Modules can now start additional startup tasks, while server is starting #2119

Closed

3 tasks

sandeepsukhani reviewed Feb 21, 2020

View reviewed changes

pracucci reviewed Feb 25, 2020

View reviewed changes

pstibrany mentioned this pull request Feb 26, 2020

Added services package. #2188

Merged

pracucci reviewed Feb 28, 2020

View reviewed changes

pkg/compactor/compactor.go Outdated Show resolved Hide resolved

pkg/cortex/cortex.go Show resolved Hide resolved

pkg/cortex/server_service.go Outdated Show resolved Hide resolved

pkg/cortex/server_service.go Show resolved Hide resolved

pull-request-size bot added the size/XXL label Mar 2, 2020

pracucci reviewed Mar 2, 2020

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

pracucci approved these changes Mar 2, 2020

View reviewed changes

integration/backward_compatibility_test.go Outdated Show resolved Hide resolved

codesome reviewed Mar 2, 2020

View reviewed changes

sandeepsukhani added a commit to grafana/cortex that referenced this pull request Mar 3, 2020

Merge PR cortexproject#2166 into r77

7a9022d

gouthamve approved these changes Mar 3, 2020

View reviewed changes

pkg/chunk/table_manager.go Outdated Show resolved Hide resolved

pstibrany added 13 commits March 3, 2020 18:32

First step towards Cortex using services model. Only Server module co…

52f564d

…nverted. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Converted runtimeconfig.Manager to service.

353cb26

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Added /services page.

dab13d9

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Converted memberlistKV to service.

3a60a3e

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Fixed runtimeconfig tests.

a06feb8

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Converted Ring to service.

a70c6ad

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Log waiting for other module to initialize.

2b142f0

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Converted overrides to service.

024ce38

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Converted client pool to a service.

fda988a

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Convert ring lifecycler into a service.

afdd06f

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Converted HA Tracker to a service.

aedece1

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Converted Distributor to a service.

daaa30a

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Handle nil from wrappedService call.

63a879c

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

pstibrany added 19 commits March 3, 2020 18:33

Comment.

22a4637

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Added unit test for Cortex initialization and module dependencies.

874c56b

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Comments, return errors.

702b575

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Unified /ready handlers into one, that reflects state of all services.

ca27213

It also uses ingester's check (if configured) to limit rate of starting ingesters. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Added //nolint:errcheck to `defer services.StopAndAwaitTerminated(...…

1f597b4

…)` calls. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Fix http handler logic. Also renamed it.

5a22c3d

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Address review feedback.

458fa76

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

One more test... since Shutdown already stops Run, no need to call St…

451eba7

…op(). We can use Stop in the test instead. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

CHANGELOG.md

abc8d2b

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Fixed integration test, old versions of Cortex didn't have /ready probe.

6e5540d

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Make lint happy.

ecfa362

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Mention /ready for all services.

3d412a8

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Wrap "not running" error into promql.ErrStorage.

991bb9d

That makes API handler to return HTTP code 500 (server error) instead of 422 (client error). Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Expose http port via method on HTTPService.

16d5a3a

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Print number of services in each state.

085b6ac

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Fix comment and remove obsolete line.

4329075

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Fix compile errors after rebase.

528451d

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Rebased and converted data purger to a service.

dda59cc

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Pass context to the bucket client.

26fadae

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

gouthamve merged commit 9a40de2 into cortexproject:master Mar 4, 2020

gouthamve mentioned this pull request Mar 4, 2020

Flusher target to flush WAL #2075

Merged

3 tasks

This was referenced Mar 5, 2020

Move call of stopIncomingRequests before lifecycler shutdown. #2219

Merged

Update Cortex to master grafana/loki#1785

Merged

pstibrany mentioned this pull request Mar 13, 2020

Convert Loki modules to services grafana/loki#1804

Merged

2 tasks

gouthamve mentioned this pull request Mar 30, 2020

Cortex components should have readiness check endpoints #784

Closed

Convert Cortex components to services #2166

Convert Cortex components to services #2166

Uh oh!

Conversation

pstibrany commented Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sandeepsukhani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sandeepsukhani Feb 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pstibrany commented Feb 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pstibrany commented Feb 26, 2020

Uh oh!

pracucci commented Feb 26, 2020

Uh oh!

pstibrany commented Feb 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pracucci commented Feb 28, 2020

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pstibrany commented Feb 28, 2020

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gouthamve left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michaeljoy commented Mar 9, 2020

Uh oh!

pracucci commented Mar 10, 2020

Uh oh!

pstibrany commented Feb 20, 2020 •

edited

Loading

sandeepsukhani Feb 21, 2020 •

edited

Loading

pstibrany commented Feb 21, 2020 •

edited

Loading

pstibrany commented Feb 28, 2020 •

edited

Loading

michaeljoy commented Mar 13, 2020 •

edited

Loading