Skip to content

Race condition in terraformPluginSDKExternal causing provider restarts #472

@jake-ciolek

Description

@jake-ciolek

What happened?

This has been previously reported here and here. Unfortunately, it hasn't been resolved yet.

This issue is prevalent in big environments where the provider manages many resources. The end result is that the Upjet provider pod restarts and affects stability in large Crossplane-managed environments using Upjet generated providers.

How can we reproduce it?

This is observed in a large Upjet-managed environment, for example one with a few hundred SQS Queues.

Root cause

I looked into this a bit and ran a race detector on Upjet itself, it fails with:

==================
WARNING: DATA RACE
Write at 0x000103c39580 by goroutine 82:
  github.com/crossplane/upjet/pkg/resource/fake.(*Observable).SetObservation()
      /Users/jciolek@alpha-sense.com/25feb/upjet/pkg/resource/fake/terraformed.go:32 +0x3c
  github.com/crossplane/upjet/pkg/resource/fake.(*Terraformed).SetObservation()
      <autogenerated>:1 +0x20
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Update()
      /Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_tfpluginsdk.go:718 +0x530
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Update.func1()
      /Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:199 +0x2a0

Previous write at 0x000103c39580 by goroutine 79:
  github.com/crossplane/upjet/pkg/resource/fake.(*Observable).SetObservation()
      /Users/jciolek@alpha-sense.com/25feb/upjet/pkg/resource/fake/terraformed.go:32 +0x3c
  github.com/crossplane/upjet/pkg/resource/fake.(*Terraformed).SetObservation()
      <autogenerated>:1 +0x20
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Create()
      /Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_tfpluginsdk.go:660 +0x5c0
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create.func1()
      /Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:166 +0x2a0

Goroutine 82 (running) created at:
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Update()
      /Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:178 +0x244
  github.com/crossplane/upjet/pkg/controller.TestAsyncTerraformPluginSDKUpdate.func3()
      /Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk_test.go:286 +0x94
  testing.tRunner()
      /opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1792 +0x180
  testing.(*T).Run.gowrap1()
      /opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1851 +0x40

Goroutine 79 (finished) created at:
  github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create()
      /Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:145 +0x244
  github.com/crossplane/upjet/pkg/controller.TestAsyncTerraformPluginSDKCreate.func3()
      /Users/jciolek@alpha-sense.com/25feb/upjet/pkg/controller/external_async_tfpluginsdk_test.go:242 +0x94
  testing.tRunner()
      /opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1792 +0x180
  testing.(*T).Run.gowrap1()
      /opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1851 +0x40
==================
--- FAIL: TestConnect (0.00s)
    testing.go:1490: race detected during execution of test

It looks like it might be the cause of the problem, where the controller's Create and Update call the SetObservation method concurrently on a vanilla Go map which is not thread safe.

Solution

Introduce synchronization or use a thread-safe data structure like sync.Map.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions