Skip to content

Ingester blowing up to tens of thousands of goroutines #4324

Closed
@bboreham

Description

@bboreham

This is very similar to #858, but I decided to open a new issue as the code involved is different in this case.
We have some metrics from production ~3 months ago that one ingester hit 400,000 goroutines. But little clear data.

To try to reproduce, I deliberately limited ingester's CPU and fired in 500 requests/sec from Avalanche.

Beginning of goroutine dump:

goroutine profile: total 17421
8689 @ 0x43b2c5 0x44cf05 0x44ceee 0x46e5e7 0x47d805 0x47ef70 0x47ef02 0x9d09a7 0x9c6fc7 0x9c6f50 0x2082e45 0x2082de0 0x2082ea6 0x20799dd 0x156b2e9 0x20db263 0xb9ab23 0xd0b2a4 0x20c6cb6 0xb9ab23 0xb9e7e2 0xb9ab23 0xd0eafa 0xb9ab23 0xd0b814 0xb9ab23 0xb9ad17 0x154fdb0 0xb439cb 0xb47b8c 0xb5648b 0x472701
#	0x46e5e6	sync.runtime_SemacquireMutex+0x46							/usr/local/go/src/runtime/sema.go:71
#	0x47d804	sync.(*Mutex).lockSlow+0x104								/usr/local/go/src/sync/mutex.go:138
#	0x47ef6f	sync.(*Mutex).Lock+0x8f									/usr/local/go/src/sync/mutex.go:81
#	0x47ef01	sync.(*RWMutex).Lock+0x21								/usr/local/go/src/sync/rwmutex.go:111
#	0x9d09a6	github.com/prometheus/prometheus/tsdb.(*isolation).newAppendID+0x46			/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/isolation.go:126
#	0x9c6fc6	github.com/prometheus/prometheus/tsdb.(*Head).appender+0x46				/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/head.go:1193
#	0x9c6f4f	github.com/prometheus/prometheus/tsdb.(*Head).Appender+0xaf				/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/head.go:1189
#	0x2082e44	github.com/prometheus/prometheus/tsdb.(*DB).Appender+0x3a4				/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/db.go:797
#	0x2082ddf	github.com/cortexproject/cortex/pkg/ingester.(*userTSDB).Appender+0x33f			/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester_v2.go:149
#	0x2082ea5	github.com/cortexproject/cortex/pkg/ingester.(*Ingester).v2Push+0x405			/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester_v2.go:777
#	0x20799dc	github.com/cortexproject/cortex/pkg/ingester.(*Ingester).Push+0x8dc			/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester.go:475
#	0x156b2e8	github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler.func1+0x88	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2565
#	0x20db262	github.com/cortexproject/cortex/pkg/cortex.ThanosTracerUnaryInterceptor+0xa2		/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/cortex/tracing.go:14
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0b2a3	github.com/weaveworks/common/middleware.ServerUserHeaderInterceptor+0xa3		/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_auth.go:38
#	0x20c6cb5	github.com/cortexproject/cortex/pkg/util/fakeauth.SetupAuthMiddleware.func1+0x115	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/util/fakeauth/fake_auth.go:27
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xb9e7e1	github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1+0x301		/backend-enterprise/vendor/github.com/opentracing-contrib/go-grpc/server.go:57
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0eaf9	github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1+0x99	/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:32
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0b813	github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor+0x93	/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:29
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xb9ad16	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xd6		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
#	0x154fdaf	github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler+0x14f	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2567
#	0xb439ca	google.golang.org/grpc.(*Server).processUnaryRPC+0x52a					/backend-enterprise/vendor/google.golang.org/grpc/server.go:1210
#	0xb47b8b	google.golang.org/grpc.(*Server).handleStream+0xd0b					/backend-enterprise/vendor/google.golang.org/grpc/server.go:1533
#	0xb5648a	google.golang.org/grpc.(*Server).serveStreams.func1.2+0xaa				/backend-enterprise/vendor/google.golang.org/grpc/server.go:871

6866 @ 0x43b2c5 0x44cf05 0x44ceee 0x46e5e7 0x47d805 0x47ef70 0x47ef02 0x9d0c5e 0x9c9254 0x9b3775 0x20844a7 0x20799dd 0x156b2e9 0x20db263 0xb9ab23 0xd0b2a4 0x20c6cb6 0xb9ab23 0xb9e7e2 0xb9ab23 0xd0eafa 0xb9ab23 0xd0b814 0xb9ab23 0xb9ad17 0x154fdb0 0xb439cb 0xb47b8c 0xb5648b 0x472701
#	0x46e5e6	sync.runtime_SemacquireMutex+0x46							/usr/local/go/src/runtime/sema.go:71
#	0x47d804	sync.(*Mutex).lockSlow+0x104								/usr/local/go/src/sync/mutex.go:138
#	0x47ef6f	sync.(*Mutex).Lock+0x8f									/usr/local/go/src/sync/mutex.go:81
#	0x47ef01	sync.(*RWMutex).Lock+0x21								/usr/local/go/src/sync/rwmutex.go:111
#	0x9d0c5d	github.com/prometheus/prometheus/tsdb.(*isolation).closeAppend+0x3d			/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/isolation.go:152
#	0x9c9253	github.com/prometheus/prometheus/tsdb.(*headAppender).Commit+0x633			/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/head.go:1521
#	0x9b3774	github.com/prometheus/prometheus/tsdb.dbAppender.Commit+0x34				/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/db.go:817
#	0x20844a6	github.com/cortexproject/cortex/pkg/ingester.(*Ingester).v2Push+0x1a06			/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester_v2.go:896
#	0x20799dc	github.com/cortexproject/cortex/pkg/ingester.(*Ingester).Push+0x8dc			/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester.go:475
#	0x156b2e8	github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler.func1+0x88	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2565
#	0x20db262	github.com/cortexproject/cortex/pkg/cortex.ThanosTracerUnaryInterceptor+0xa2		/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/cortex/tracing.go:14
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0b2a3	github.com/weaveworks/common/middleware.ServerUserHeaderInterceptor+0xa3		/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_auth.go:38
#	0x20c6cb5	github.com/cortexproject/cortex/pkg/util/fakeauth.SetupAuthMiddleware.func1+0x115	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/util/fakeauth/fake_auth.go:27
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xb9e7e1	github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1+0x301		/backend-enterprise/vendor/github.com/opentracing-contrib/go-grpc/server.go:57
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0eaf9	github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1+0x99	/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:32
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0b813	github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor+0x93	/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:29
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xb9ad16	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xd6		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
#	0x154fdaf	github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler+0x14f	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2567
#	0xb439ca	google.golang.org/grpc.(*Server).processUnaryRPC+0x52a					/backend-enterprise/vendor/google.golang.org/grpc/server.go:1210
#	0xb47b8b	google.golang.org/grpc.(*Server).handleStream+0xd0b					/backend-enterprise/vendor/google.golang.org/grpc/server.go:1533
#	0xb5648a	google.golang.org/grpc.(*Server).serveStreams.func1.2+0xaa				/backend-enterprise/vendor/google.golang.org/grpc/server.go:871

So, when ingester can't keep up, many goroutines can be blocked on access to TSDB.

Just as in #858, it seems to me that once the number of goroutines goes beyond some limit we would be better off failing immediately than trying to carry on with the request.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions