Skip to content

panic when opensearch is not reachable #1528

@butonic

Description

@butonic

When the search service cannot reach opensearch it will die, causing the suture backoff handling to panic when it tries to call cancel():

func Start(ctx context.Context, o ...Option) error {
	// Start the runtime. Most likely this was called ONLY by the `opencloud server` subcommand, but since we cannot protect
	// from the caller, the previous statement holds truth.

	// prepare a new rpc Service struct.
	s, err := NewService(ctx, o...)
	if err != nil {
		return err
	}

	// cancel the context when a signal is received.
	var cancel context.CancelFunc       // <- cancel is nil
	if ctx == nil {                                  // <- ctx is already set
		ctx, cancel = signal.NotifyContext(context.Background(), runner.StopSignals...)   // cancel is thus not set
		defer cancel()
	}

	// tolerance controls backoff cycles from the supervisor.
	tolerance := 5
	totalBackoff := 0

	// Start creates its own supervisor. Running services under `opencloud server` will create its own supervision tree.
	s.Supervisor = suture.New("opencloud", suture.Spec{
		EventHook: func(e suture.Event) {
			if e.Type() == suture.EventTypeBackoff {
				totalBackoff++
				if totalBackoff == tolerance {
					cancel()                                           // <- panic

The signal.NotifyContext is also called in the root.go, but there we forget about cancel:

// Execute is the entry point for the opencloud command.
func Execute() error {
	cfg := config.DefaultConfig()

	app := clihelper.DefaultApp(&cli.App{
		Name:  "opencloud",
		Usage: "opencloud",
	})

	for _, fn := range register.Commands {
		app.Commands = append(
			app.Commands,
			fn(cfg),
		)
	}

	ctx, _ := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT, syscall.SIGHUP)
	return app.RunContext(ctx, os.Args)
}

There are two problems:

  1. we should not panic - maybe our root should not register the signals and Start() should always wrap the context and set the cance function.
  2. the search service should not die if it cannot reach opensearch?

This is what we can see in the logs.

api {"level":"info","service":"opencloud","event":"opencloud: Failed service 'service.SutureService{exec:(func(context.Context) error)(0x6280544fef60)}' (1.000000 failures 
of 5.000000), restarting: true, error: failed to create OpenSearch backend: cluster is not healthy, failed to ping opensearch: tls: first record does not look like a TLS ha
ndshake","time":"2025-09-22T09:24:39Z","line":"github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:383","message":"supervisor: opencloud"}           
api {"level":"info","service":"opencloud","event":"opencloud: Failed service 'service.SutureService{exec:(func(context.Context) error)(0x6280544fef60)}' (1.999796 failures 
of 5.000000), restarting: true, error: failed to create OpenSearch backend: cluster is not healthy, failed to ping opensearch: tls: first record does not look like a TLS ha
ndshake","time":"2025-09-22T09:24:39Z","line":"github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:383","message":"supervisor: opencloud"}           
api {"level":"info","service":"opencloud","event":"opencloud: Failed service 'service.SutureService{exec:(func(context.Context) error)(0x6280544fef60)}' (2.999393 failures 
of 5.000000), restarting: true, error: failed to create OpenSearch backend: cluster is not healthy, failed to ping opensearch: tls: first record does not look like a TLS ha
ndshake","time":"2025-09-22T09:24:39Z","line":"github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:383","message":"supervisor: opencloud"}           
api {"level":"info","service":"opencloud","event":"opencloud: Failed service 'service.SutureService{exec:(func(context.Context) error)(0x6280544fef60)}' (3.998787 failures 
of 5.000000), restarting: true, error: failed to create OpenSearch backend: cluster is not healthy, failed to ping opensearch: tls: first record does not look like a TLS ha
ndshake","time":"2025-09-22T09:24:39Z","line":"github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:383","message":"supervisor: opencloud"}           
api {"level":"info","service":"opencloud","event":"opencloud: Failed service 'service.SutureService{exec:(func(context.Context) error)(0x6280544fef60)}' (4.997914 failures 
of 5.000000), restarting: true, error: failed to create OpenSearch backend: cluster is not healthy, failed to ping opensearch: tls: first record does not look like a TLS ha
ndshake","time":"2025-09-22T09:24:39Z","line":"github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:383","message":"supervisor: opencloud"}           
api panic: runtime error: invalid memory address or nil pointer dereference                                                                                                 
api [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6280545011ef]                                                                                            
api                                                                                                                                                                         
api goroutine 98 [running]:                                                                                                                                                 
api github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service.Start.func1({0x628056a4dc50, 0xc001d13dd0})                                                             
api     github.com/opencloud-eu/opencloud/opencloud/pkg/runtime/service/service.go:380 +0x6f                                                                                
api github.com/thejerf/suture/v4.(*Supervisor).handleFailedService(0xc000a8e000, {0x628056a56990, 0xc001492500}, 0x1a, {0x6280563ed5a0, 0xc0025751c0}, {0x0, 0x0, 0x0?}, 0x0
)                                                                                                                                                                           
api     github.com/thejerf/suture/v4@v4.0.6/supervisor.go:486 +0x2a4                                                                                                        
api github.com/thejerf/suture/v4.(*Supervisor).Serve(0xc000a8e000, {0x628056a56b88?, 0xc000c70500?})                                                                        
api     github.com/thejerf/suture/v4@v4.0.6/supervisor.go:383 +0x79f                                                                                                        
api github.com/thejerf/suture/v4.(*Supervisor).ServeBackground.func1()                                                                                                      
api     github.com/thejerf/suture/v4@v4.0.6/supervisor.go:297 +0x28                                                                                                         
api created by github.com/thejerf/suture/v4.(*Supervisor).ServeBackground in goroutine 27                                                                                   
api     github.com/thejerf/suture/v4@v4.0.6/supervisor.go:296 +0xb8

Metadata

Metadata

Assignees

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions