Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The goroutines leak in a single binary ocis #10577

Open
2403905 opened this issue Nov 15, 2024 · 5 comments
Open

The goroutines leak in a single binary ocis #10577

2403905 opened this issue Nov 15, 2024 · 5 comments
Labels

Comments

@2403905
Copy link
Contributor

2403905 commented Nov 15, 2024

Describe the bug

A single binary ocis setup has high memory usage and the goroutines leak after long-running under the loading.

Steps to reproduce

  1. Run the ocis localy
  2. Use the Prometeus + Grafana
  3. Run any test like make test-acceptance-api \\nTEST_SERVER_URL=https://localhost:9200 \\nTEST_WITH_GRAPH_API=true \\nTEST_OCIS=true \\nBEHAT_FEATURE=tests/acceptance/features/apiSpacesShares/shareUploadTUS
  4. Grab the goroutine profile curl 'http://localhost:9114/debug/pprof/goroutine' > goroutine-running.out
  5. Grab the goroutine logs curl 'http://localhost:9114/debug/pprof/goroutine?debug=2' > goroutine-running.log
  6. Stop tests and grab profiles when the number of goroutines stopped moving down.
  7. Grab the goroutine profile curl 'http://localhost:9114/debug/pprof/goroutine' > goroutine-running.out
  8. Grab the goroutine logs curl 'http://localhost:9114/debug/pprof/goroutine?debug=2' > goroutine-running.log

Expected behavior

The number of goroutines should go down after the load goes down.

Actual behavior

The number of goroutines is growing during the test and should go down after the test stops. But we can see that thousands of goroutines are blocked and never go away. Most of them are google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run and github.com/owncloud/ocis/v2/ocis-pkg/natsjsregistry.(*Watcher).Next. It leads to the goroutie and memory leak.
The amount of time goroutines were blocked, reaching the approximate execution time of the ocis binary.
The locks are related to the go-micro/registry cache and ocis-pkg/natsjsregistry Watcher.

goroutine 113937 [chan receive, 112 minutes]:
github.com/owncloud/ocis/v2/ocis-pkg/natsjsregistry.(*Watcher).Next(0xc001ea4030)
  /Users/roman/projects/owncloud/ocis/ocis-pkg/natsjsregistry/watcher.go:46 +0x33
go-micro.dev/v4/registry/cache.(*cache).watch(0xc007349a70, {0x112427058, 0xc001ea4030})
  /Users/roman/projects/owncloud/go-micro/registry/cache/cache.go:439 +0xc6
go-micro.dev/v4/registry/cache.(*cache).run(0xc007349a70, {0x1102758d8, 0x19})
  /Users/roman/projects/owncloud/go-micro/registry/cache/cache.go:394 +0x399
created by go-micro.dev/v4/registry/cache.(*cache).get in goroutine 81855
  /Users/roman/projects/owncloud/go-micro/registry/cache/cache.go:200 +0x2e5 

goroutine 23171 [chan receive, 68 minutes]:
github.com/owncloud/ocis/v2/ocis-pkg/natsjsregistry.(*Watcher).Next(0xc000ff9140)
  /Users/roman/projects/owncloud/ocis/ocis-pkg/natsjsregistry/watcher.go:46 +0x33
go-micro.dev/v4/registry/cache.(*cache).watch(0xc005288cf0, {0x112427058, 0xc000ff9140})
  /Users/roman/projects/owncloud/go-micro/registry/cache/cache.go:439 +0xc6
go-micro.dev/v4/registry/cache.(*cache).run(0xc005288cf0, {0x1102758d8, 0x19})
  /Users/roman/projects/owncloud/go-micro/registry/cache/cache.go:394 +0x399
created by go-micro.dev/v4/registry/cache.(*cache).get in goroutine 2379
  /Users/roman/projects/owncloud/go-micro/registry/cache/cache.go:200 +0x2e5
Screenshot 2024-11-15 at 12 54 50 Screenshot 2024-11-15 at 12 55 06 Screenshot 2024-11-15 at 12 53 34 Screenshot 2024-11-15 at 12 53 52

goroutine-out.zip

Setup

Please describe how you started the server and provide a list of relevant environment variables or configuration files.

OCIS_XXX=somevalue
OCIS_YYY=somevalue
PROXY_XXX=somevalue

Additional context

Add any other context about the problem here.

@kobergj
Copy link
Collaborator

kobergj commented Nov 15, 2024

Do we know if the amount of goroutines in natsjsregistry.Watcher is going up? It should not as they are only spawned on startup. There should be no new goroutines added there during runtime.

@butonic
Copy link
Member

butonic commented Nov 15, 2024

AFAICT we are sometimes creating a new go micro client on the fly, eg. in services/graph/pkg/service/v0/drives.go:

// ListStorageSpacesWithFilters List Storage Spaces using filters
func (g Graph) ListStorageSpacesWithFilters(ctx context.Context, filters []*storageprovider.ListStorageSpacesRequest_Filter, unrestricted bool) (*storageprovider.ListStorageSpacesResponse, error) {
	gatewayClient, err := g.gatewaySelector.Next()
	if err != nil {
		return nil, err
	}

	grpcClient, err := grpc.NewClient(append(grpc.GetClientOptions(g.config.GRPCClientTLS), grpc.WithTraceProvider(g.traceProvider))...)
	if err != nil {
		return nil, err
	}
	s := settingssvc.NewPermissionService("com.owncloud.api.settings", grpcClient)

	_, err = s.GetPermissionByID(ctx, &settingssvc.GetPermissionByIDRequest{
		PermissionId: settingsServiceExt.ListSpacesPermission(0).Id,
	})
    //...

The problem is that the actual RPC call under the hood uses a selector. And all go micro selectors wrap the registry with a cache registry. That will start a noew watcher. Increasing the number of go routinges by one. On every requests that instantiates a now go micro client.

So, for calls to reva we need te make sure not to create a reva gateway client on service startup.
But for calls with go micre clients we need to make sure to not recreate them on every request.

🤪

@butonic
Copy link
Member

butonic commented Nov 15, 2024

one occurrence fixed with #10583

@micbar
Copy link
Contributor

micbar commented Nov 15, 2024

Keeping it open, could be an issue in more places.

@2403905
Copy link
Contributor Author

2403905 commented Nov 15, 2024

@butonic That is awesome!

The number of goroutines is stable and memory usage is low

Screenshot 2024-11-15 at 17 23 37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants