-
Notifications
You must be signed in to change notification settings - Fork 106
Description
What happened?
This has been previously reported here and here. Unfortunately, it hasn't been resolved yet.
This issue is prevalent in big environments where the provider manages many resources. The end result is that the Upjet provider pod restarts and affects stability in large Crossplane-managed environments using Upjet generated providers.
How can we reproduce it?
This is observed in a large Upjet-managed environment, for example one with a few hundred SQS Queues.
Root cause
I looked into this a bit and ran a race detector on Upjet itself, it fails with:
==================
WARNING: DATA RACE
Write at 0x000103c39580 by goroutine 82:
github.com/crossplane/upjet/pkg/resource/fake.(*Observable).SetObservation()
/Users/jciolek@alpha-sense.com/25feb/upjet/pkg/resource/fake/terraformed.go:32 +0x3c
github.com/crossplane/upjet/pkg/resource/fake.(*Terraformed).SetObservation()
<autogenerated>:1 +0x20
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Update()
/Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_tfpluginsdk.go:718 +0x530
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Update.func1()
/Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:199 +0x2a0
Previous write at 0x000103c39580 by goroutine 79:
github.com/crossplane/upjet/pkg/resource/fake.(*Observable).SetObservation()
/Users/jciolek@alpha-sense.com/25feb/upjet/pkg/resource/fake/terraformed.go:32 +0x3c
github.com/crossplane/upjet/pkg/resource/fake.(*Terraformed).SetObservation()
<autogenerated>:1 +0x20
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Create()
/Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_tfpluginsdk.go:660 +0x5c0
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create.func1()
/Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:166 +0x2a0
Goroutine 82 (running) created at:
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Update()
/Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:178 +0x244
github.com/crossplane/upjet/pkg/controller.TestAsyncTerraformPluginSDKUpdate.func3()
/Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk_test.go:286 +0x94
testing.tRunner()
/opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1792 +0x180
testing.(*T).Run.gowrap1()
/opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1851 +0x40
Goroutine 79 (finished) created at:
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create()
/Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:145 +0x244
github.com/crossplane/upjet/pkg/controller.TestAsyncTerraformPluginSDKCreate.func3()
/Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk_test.go:242 +0x94
testing.tRunner()
/opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1792 +0x180
testing.(*T).Run.gowrap1()
/opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1851 +0x40
==================
--- FAIL: TestConnect (0.00s)
testing.go:1490: race detected during execution of test
It looks like it might be the cause of the problem, where the controller's Create and Update call the SetObservation method concurrently on a vanilla Go map which is not thread safe.
Solution
Introduce synchronization or use a thread-safe data structure like sync.Map.