-
Notifications
You must be signed in to change notification settings - Fork 157
Labels
Description
When the search service cannot reach opensearch it will die, causing the suture backoff handling to panic when it tries to call cancel():
func Start(ctx context.Context, o ...Option) error {
// Start the runtime. Most likely this was called ONLY by the `opencloud server` subcommand, but since we cannot protect
// from the caller, the previous statement holds truth.
// prepare a new rpc Service struct.
s, err := NewService(ctx, o...)
if err != nil {
return err
}
// cancel the context when a signal is received.
var cancel context.CancelFunc // <- cancel is nil
if ctx == nil { // <- ctx is already set
ctx, cancel = signal.NotifyContext(context.Background(), runner.StopSignals...) // cancel is thus not set
defer cancel()
}
// tolerance controls backoff cycles from the supervisor.
tolerance := 5
totalBackoff := 0
// Start creates its own supervisor. Running services under `opencloud server` will create its own supervision tree.
s.Supervisor = suture.New("opencloud", suture.Spec{
EventHook: func(e suture.Event) {
if e.Type() == suture.EventTypeBackoff {
totalBackoff++
if totalBackoff == tolerance {
cancel() // <- panicThe signal.NotifyContext is also called in the root.go, but there we forget about cancel:
// Execute is the entry point for the opencloud command.
func Execute() error {
cfg := config.DefaultConfig()
app := clihelper.DefaultApp(&cli.App{
Name: "opencloud",
Usage: "opencloud",
})
for _, fn := range register.Commands {
app.Commands = append(
app.Commands,
fn(cfg),
)
}
ctx, _ := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT, syscall.SIGHUP)
return app.RunContext(ctx, os.Args)
}There are two problems:
- we should not panic - maybe our root should not register the signals and Start() should always wrap the context and set the cance function.
- the search service should not die if it cannot reach opensearch?
This is what we can see in the logs.
api {"level":"info","service":"opencloud","event":"opencloud: Failed service 'service.SutureService{exec:(func(context.Context) error)(0x6280544fef60)}' (1.000000 failures
of 5.000000), restarting: true, error: failed to create OpenSearch backend: cluster is not healthy, failed to ping opensearch: tls: first record does not look like a TLS ha
ndshake","time":"2025-09-22T09:24:39Z","line":"github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:383","message":"supervisor: opencloud"}
api {"level":"info","service":"opencloud","event":"opencloud: Failed service 'service.SutureService{exec:(func(context.Context) error)(0x6280544fef60)}' (1.999796 failures
of 5.000000), restarting: true, error: failed to create OpenSearch backend: cluster is not healthy, failed to ping opensearch: tls: first record does not look like a TLS ha
ndshake","time":"2025-09-22T09:24:39Z","line":"github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:383","message":"supervisor: opencloud"}
api {"level":"info","service":"opencloud","event":"opencloud: Failed service 'service.SutureService{exec:(func(context.Context) error)(0x6280544fef60)}' (2.999393 failures
of 5.000000), restarting: true, error: failed to create OpenSearch backend: cluster is not healthy, failed to ping opensearch: tls: first record does not look like a TLS ha
ndshake","time":"2025-09-22T09:24:39Z","line":"github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:383","message":"supervisor: opencloud"}
api {"level":"info","service":"opencloud","event":"opencloud: Failed service 'service.SutureService{exec:(func(context.Context) error)(0x6280544fef60)}' (3.998787 failures
of 5.000000), restarting: true, error: failed to create OpenSearch backend: cluster is not healthy, failed to ping opensearch: tls: first record does not look like a TLS ha
ndshake","time":"2025-09-22T09:24:39Z","line":"github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:383","message":"supervisor: opencloud"}
api {"level":"info","service":"opencloud","event":"opencloud: Failed service 'service.SutureService{exec:(func(context.Context) error)(0x6280544fef60)}' (4.997914 failures
of 5.000000), restarting: true, error: failed to create OpenSearch backend: cluster is not healthy, failed to ping opensearch: tls: first record does not look like a TLS ha
ndshake","time":"2025-09-22T09:24:39Z","line":"github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:383","message":"supervisor: opencloud"}
api panic: runtime error: invalid memory address or nil pointer dereference
api [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6280545011ef]
api
api goroutine 98 [running]:
api github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service.Start.func1({0x628056a4dc50, 0xc001d13dd0})
api github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:380 +0x6f
api github.com/thejerf/suture/v4.(*Supervisor).handleFailedService(0xc000a8e000, {0x628056a56990, 0xc001492500}, 0x1a, {0x6280563ed5a0, 0xc0025751c0}, {0x0, 0x0, 0x0?}, 0x0
)
api github.com/thejerf/suture/v4@v4.0.6/supervisor.go:486 +0x2a4
api github.com/thejerf/suture/v4.(*Supervisor).Serve(0xc000a8e000, {0x628056a56b88?, 0xc000c70500?})
api github.com/thejerf/suture/v4@v4.0.6/supervisor.go:383 +0x79f
api github.com/thejerf/suture/v4.(*Supervisor).ServeBackground.func1()
api github.com/thejerf/suture/v4@v4.0.6/supervisor.go:297 +0x28
api created by github.com/thejerf/suture/v4.(*Supervisor).ServeBackground in goroutine 27
api github.com/thejerf/suture/v4@v4.0.6/supervisor.go:296 +0xb8
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done