Closed
Description
This is very similar to #858, but I decided to open a new issue as the code involved is different in this case.
We have some metrics from production ~3 months ago that one ingester hit 400,000 goroutines. But little clear data.
To try to reproduce, I deliberately limited ingester's CPU and fired in 500 requests/sec from Avalanche.
Beginning of goroutine dump:
goroutine profile: total 17421
8689 @ 0x43b2c5 0x44cf05 0x44ceee 0x46e5e7 0x47d805 0x47ef70 0x47ef02 0x9d09a7 0x9c6fc7 0x9c6f50 0x2082e45 0x2082de0 0x2082ea6 0x20799dd 0x156b2e9 0x20db263 0xb9ab23 0xd0b2a4 0x20c6cb6 0xb9ab23 0xb9e7e2 0xb9ab23 0xd0eafa 0xb9ab23 0xd0b814 0xb9ab23 0xb9ad17 0x154fdb0 0xb439cb 0xb47b8c 0xb5648b 0x472701
# 0x46e5e6 sync.runtime_SemacquireMutex+0x46 /usr/local/go/src/runtime/sema.go:71
# 0x47d804 sync.(*Mutex).lockSlow+0x104 /usr/local/go/src/sync/mutex.go:138
# 0x47ef6f sync.(*Mutex).Lock+0x8f /usr/local/go/src/sync/mutex.go:81
# 0x47ef01 sync.(*RWMutex).Lock+0x21 /usr/local/go/src/sync/rwmutex.go:111
# 0x9d09a6 github.com/prometheus/prometheus/tsdb.(*isolation).newAppendID+0x46 /backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/isolation.go:126
# 0x9c6fc6 github.com/prometheus/prometheus/tsdb.(*Head).appender+0x46 /backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/head.go:1193
# 0x9c6f4f github.com/prometheus/prometheus/tsdb.(*Head).Appender+0xaf /backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/head.go:1189
# 0x2082e44 github.com/prometheus/prometheus/tsdb.(*DB).Appender+0x3a4 /backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/db.go:797
# 0x2082ddf github.com/cortexproject/cortex/pkg/ingester.(*userTSDB).Appender+0x33f /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester_v2.go:149
# 0x2082ea5 github.com/cortexproject/cortex/pkg/ingester.(*Ingester).v2Push+0x405 /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester_v2.go:777
# 0x20799dc github.com/cortexproject/cortex/pkg/ingester.(*Ingester).Push+0x8dc /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester.go:475
# 0x156b2e8 github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler.func1+0x88 /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2565
# 0x20db262 github.com/cortexproject/cortex/pkg/cortex.ThanosTracerUnaryInterceptor+0xa2 /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/cortex/tracing.go:14
# 0xb9ab22 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xd0b2a3 github.com/weaveworks/common/middleware.ServerUserHeaderInterceptor+0xa3 /backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_auth.go:38
# 0x20c6cb5 github.com/cortexproject/cortex/pkg/util/fakeauth.SetupAuthMiddleware.func1+0x115 /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/util/fakeauth/fake_auth.go:27
# 0xb9ab22 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xb9e7e1 github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1+0x301 /backend-enterprise/vendor/github.com/opentracing-contrib/go-grpc/server.go:57
# 0xb9ab22 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xd0eaf9 github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1+0x99 /backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:32
# 0xb9ab22 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xd0b813 github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor+0x93 /backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:29
# 0xb9ab22 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xb9ad16 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xd6 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
# 0x154fdaf github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler+0x14f /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2567
# 0xb439ca google.golang.org/grpc.(*Server).processUnaryRPC+0x52a /backend-enterprise/vendor/google.golang.org/grpc/server.go:1210
# 0xb47b8b google.golang.org/grpc.(*Server).handleStream+0xd0b /backend-enterprise/vendor/google.golang.org/grpc/server.go:1533
# 0xb5648a google.golang.org/grpc.(*Server).serveStreams.func1.2+0xaa /backend-enterprise/vendor/google.golang.org/grpc/server.go:871
6866 @ 0x43b2c5 0x44cf05 0x44ceee 0x46e5e7 0x47d805 0x47ef70 0x47ef02 0x9d0c5e 0x9c9254 0x9b3775 0x20844a7 0x20799dd 0x156b2e9 0x20db263 0xb9ab23 0xd0b2a4 0x20c6cb6 0xb9ab23 0xb9e7e2 0xb9ab23 0xd0eafa 0xb9ab23 0xd0b814 0xb9ab23 0xb9ad17 0x154fdb0 0xb439cb 0xb47b8c 0xb5648b 0x472701
# 0x46e5e6 sync.runtime_SemacquireMutex+0x46 /usr/local/go/src/runtime/sema.go:71
# 0x47d804 sync.(*Mutex).lockSlow+0x104 /usr/local/go/src/sync/mutex.go:138
# 0x47ef6f sync.(*Mutex).Lock+0x8f /usr/local/go/src/sync/mutex.go:81
# 0x47ef01 sync.(*RWMutex).Lock+0x21 /usr/local/go/src/sync/rwmutex.go:111
# 0x9d0c5d github.com/prometheus/prometheus/tsdb.(*isolation).closeAppend+0x3d /backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/isolation.go:152
# 0x9c9253 github.com/prometheus/prometheus/tsdb.(*headAppender).Commit+0x633 /backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/head.go:1521
# 0x9b3774 github.com/prometheus/prometheus/tsdb.dbAppender.Commit+0x34 /backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/db.go:817
# 0x20844a6 github.com/cortexproject/cortex/pkg/ingester.(*Ingester).v2Push+0x1a06 /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester_v2.go:896
# 0x20799dc github.com/cortexproject/cortex/pkg/ingester.(*Ingester).Push+0x8dc /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester.go:475
# 0x156b2e8 github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler.func1+0x88 /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2565
# 0x20db262 github.com/cortexproject/cortex/pkg/cortex.ThanosTracerUnaryInterceptor+0xa2 /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/cortex/tracing.go:14
# 0xb9ab22 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xd0b2a3 github.com/weaveworks/common/middleware.ServerUserHeaderInterceptor+0xa3 /backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_auth.go:38
# 0x20c6cb5 github.com/cortexproject/cortex/pkg/util/fakeauth.SetupAuthMiddleware.func1+0x115 /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/util/fakeauth/fake_auth.go:27
# 0xb9ab22 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xb9e7e1 github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1+0x301 /backend-enterprise/vendor/github.com/opentracing-contrib/go-grpc/server.go:57
# 0xb9ab22 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xd0eaf9 github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1+0x99 /backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:32
# 0xb9ab22 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xd0b813 github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor+0x93 /backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:29
# 0xb9ab22 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
# 0xb9ad16 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xd6 /backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
# 0x154fdaf github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler+0x14f /backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2567
# 0xb439ca google.golang.org/grpc.(*Server).processUnaryRPC+0x52a /backend-enterprise/vendor/google.golang.org/grpc/server.go:1210
# 0xb47b8b google.golang.org/grpc.(*Server).handleStream+0xd0b /backend-enterprise/vendor/google.golang.org/grpc/server.go:1533
# 0xb5648a google.golang.org/grpc.(*Server).serveStreams.func1.2+0xaa /backend-enterprise/vendor/google.golang.org/grpc/server.go:871
So, when ingester can't keep up, many goroutines can be blocked on access to TSDB.
Just as in #858, it seems to me that once the number of goroutines goes beyond some limit we would be better off failing immediately than trying to carry on with the request.