Skip to content

Conversation

@karencfv
Copy link
Collaborator

Taking a look at the resource docs they already say "Firewall rules defined by this resource are considered exhaustive and will overwrite any other firewall rules for the VPC once applied."

Not sure if this needs more clarification? WDYT @sudomateo?

Waiting on an API change to fully run acceptance tests

Fixes: #431

Copy link
Collaborator

@sudomateo sudomateo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me. I just want to run down the desired flow quickly to make sure I have it right.

  1. oxidecomputer/omicron#8175 will allow rules to be null in the request body (via omitempty in the Go SDK) and then default to [] in the API handler. Right now null doesn't work because the omicron API doesn't allow that.
  2. https://github.com/oxidecomputer/oxide.go does not need to be updated since the change in omicron doesn't change the output from the generation in the Go SDK.
  3. The Terraform code uses an empty slice (not nil!) for rules which will serialize to null which is now accepted by point 1. The removal of requires replace allows the update method to be called.

@karencfv
Copy link
Collaborator Author

oxidecomputer/omicron#8175 will allow rules to be null in the request body (via omitempty in the Go SDK) and then default to [] in the API handler. Right now null doesn't work because the omicron API doesn't allow that.

Yep, but this change doesn't affect this PR per se. More like I noticed this resource was broken when working on this. It's broken since we made the refactoring PR.

https://github.com/oxidecomputer/oxide.go does not need to be updated since the change in omicron doesn't change the output from the generation in the Go SDK.

Not really, no.

The Terraform code uses an empty slice (not nil!) for rules which will serialize to null which is now accepted by point 1. The removal of requires replace allows the update method to be called.

Yep!

@sudomateo
Copy link
Collaborator

Instead of waiting for the latest version of omicron to be deployed for testing I updated the Go SDK in oxidecomputer/oxide.go#289 to set omitzero for certain types. This will allow Terraform to use an empty slice for updates and deletes.

@sudomateo sudomateo marked this pull request as ready for review June 2, 2025 19:08
karencfv and others added 3 commits June 2, 2025 15:09
Add a test to check whether an `oxide_vpc_firewall_rules` resource can
successfully remove all rules from a VPC during its `Update` call.
Support removing all VPC firewall rules through the `Update` call. This
change relies on the changes in the upstream Go SDK to serialize the VPC
firewall rules as `{"rules": []}` when they are removed from Terraform.
Comment on lines -502 to -505
if rules == nil {
return body
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed because the range below will not run when nil.

Update the lint tool versions to be compatible with Go 1.24.
@sudomateo
Copy link
Collaborator

Oh CI/CD, why do you do this to me?

The version of Go used to install tfproviderlint is making CI/CD unhappy. Notice how the installed Go version is 1.24.3 but the version of Go that compiled tfproviderlint is 1.23.9. Weird!

-> Running Terraform static analysis linter
go version go1.24.3 linux/amd64
tfproviderlint v0.15.0-dev
/home/runner/work/terraform-provider-oxide/terraform-provider-oxide/bin/tfproviderlint: go1.23.9

When I run make lint locally both versions are 1.24.3.

-> Running Terraform static analysis linter
go version go1.24.3 linux/amd64
tfproviderlint v0.15.0-dev
/home/sudomateo/oxide/terraform-provider-oxide/bin/tfproviderlint: go1.24.3

@sudomateo sudomateo force-pushed the fw-rules-update branch 2 times, most recently from 819f105 to 9a220bc Compare June 2, 2025 21:59
@sudomateo
Copy link
Collaborator

Getting closer! I can see Go 1.23.9 is being used to compile tfproviderlint. Unsure if it's because of GitHub Actions cache or something else. I did delete the cache previously but I will keep digging.

#
# github.com/bflad/tfproviderlint/cmd/tfproviderlint
#

github.com/bflad/tfproviderlint/cmd/tfproviderlint
mkdir -p $WORK/b001/
cat >$WORK/b001/importcfg << 'EOF' # internal
# import config
packagefile github.com/bflad/tfproviderlint/helper/cmdflags=$WORK/b002/_pkg_.a
packagefile github.com/bflad/tfproviderlint/passes=$WORK/b061/_pkg_.a
packagefile golang.org/x/tools/go/analysis/multichecker=$WORK/b207/_pkg_.a
packagefile runtime=$WORK/b011/_pkg_.a
EOF
/opt/hostedtoolcache/go/1.23.9/x64/pkg/tool/linux_amd64/compile -o $WORK/b001/_pkg_.a -trimpath "$WORK/b001=>" -p main -lang=go1.22 -complete -buildid lZZ7gepE586FpOT_ubyu/lZZ7gepE586FpOT_ubyu -goversion go1.23.9 -c=4 -nolocalimports -importcfg $WORK/b001/importcfg -pack /root/go/pkg/mod/github.com/bflad/tfproviderlint@v0.31.0/cmd/tfproviderlint/tfproviderlint.go
/opt/hostedtoolcache/go/1.23.9/x64/pkg/tool/linux_amd64/buildid -w $WORK/b001/_pkg_.a # internal
cat >$WORK/b001/importcfg.link << 'EOF' # internal
...
mkdir -p $WORK/b001/exe/
cd .
GOROOT='/opt/hostedtoolcache/go/1.23.9/x64' /opt/hostedtoolcache/go/1.23.9/x64/pkg/tool/linux_amd64/link -o $WORK/b001/exe/a.out -importcfg $WORK/b001/importcfg.link -X=runtime.godebugDefault=asynctimerchan=1,gotypesalias=0,httpservecontentkeepheaders=1,tls3des=1,tlskyber=0,winreadlinkvolume=0,winsymlink=0,x509keypairleaf=0,x509negativeserial=1 -buildmode=exe -buildid=Fz10GQZu7EeA-ZleT__a/lZZ7gepE586FpOT_ubyu/lZZ7gepE586FpOT_ubyu/Fz10GQZu7EeA-ZleT__a -extld=gcc $WORK/b001/_pkg_.a
/opt/hostedtoolcache/go/1.23.9/x64/pkg/tool/linux_amd64/buildid -w $WORK/b001/exe/a.out # internal
mkdir -p /home/runner/work/terraform-provider-oxide/terraform-provider-oxide/bin/
mv $WORK/b001/exe/a.out /home/runner/work/terraform-provider-oxide/terraform-provider-oxide/bin/tfproviderlint
rm -rf $WORK/b001/

@sudomateo
Copy link
Collaborator

sudomateo commented Jun 2, 2025

Running go env in CI/CD reports Go 1.24.3 but the go install continues to use Go 1.23.9. I'm unsure why but I'm going to push some commits and try a few things.

GOBIN='/home/runner/work/terraform-provider-oxide/terraform-provider-oxide/bin'
GOCACHE='/root/.cache/go-build'
GOENV='/home/runner/.config/go/env'
GOMOD='/home/runner/work/terraform-provider-oxide/terraform-provider-oxide/go.mod'
GOMODCACHE='/root/go/pkg/mod'
GOPATH='/root/go'
GOROOT='/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.3.linux-amd64'
GOTOOLDIR='/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.3.linux-amd64/pkg/tool/linux_amd64'
GOVERSION='go1.24.3'

@sudomateo
Copy link
Collaborator

The CI/CD issue occurred because the step runs sudo make lint instead of make lint. Using sudo means the version of Go installed by setup-go is no longer in use and the job falls back to the version of Go included in the runner, which is Go 1.23.9 in the ubuntu-latest runner. Thank you Gophers Slack for the eyes!

run: make test
- name: lint
run: sudo make lint
run: make lint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karencfv I didn't see why the sudo was added in the blamed commit (#95). I removed it because it was causing the runner to fall back to the version of Go included in the runner rather than the version of Go installed by setup-go.

@sudomateo
Copy link
Collaborator

sudomateo commented Jun 2, 2025

@karencfv I know you're the author of this so I can't add you as a reviewer but I pushed a few commits to get this over the line. Can you give this a once over when you have a chance?

Here's the test output.

> TEST_ACC_NAME=TestAccCloudResourceFirewallRules make testacc
-> Running terraform acceptance tests
=== RUN   TestAccCloudResourceFirewallRules_full
=== PAUSE TestAccCloudResourceFirewallRules_full
=== CONT  TestAccCloudResourceFirewallRules_full
--- PASS: TestAccCloudResourceFirewallRules_full (10.53s)
PASS
ok      github.com/oxidecomputer/terraform-provider-oxide/internal/provider     10.537s

Copy link
Collaborator Author

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thanks! That CI issue looks annoying to debug, glad you got to the bottom of it! I don't mind removing that sudo. That code predates me, so not entirely sure why it was sudo to begin with.

It appears I don't have the ability to approve this PR because I opened it. Do you mind approving it?

@karencfv
Copy link
Collaborator Author

karencfv commented Jun 2, 2025

Ah! just saw you approved it. You know what? It's missing a changelog entry, one sec let me get that in

@karencfv karencfv merged commit 7478329 into oxidecomputer:main Jun 2, 2025
1 check passed
@karencfv karencfv deleted the fw-rules-update branch June 2, 2025 23:30
sudomateo added a commit that referenced this pull request Jun 20, 2025
Added acceptance tests to surface the issue reported in
#453.

Updated the schema for `rules` from `schema.SetNestedAttribute` to
`schema.ListNestedAttribute` to better work with how the Oxide API
treats VPC firewall rules.

See below for the reason why this was required.

In #432 we removed the requires replace directive from the
`vpc_firewall_rules` resource to support in-place modification. The
intent was to prevent unnecessary network connectivity issues while
VPC firewall rules are deleted and created anew (e.g., when the delete
succeeds and the create fails). With this change, `Update` is now called
rather than `Delete` followed by `Create`.

However, this change was also a bit of a workaround since the Oxide
API does not support in-place updates of VPC firewall rules, at least
not in the manner than Terraform expects. Instead, when you replace VPC
firewall rules using the Oxide API each rule will be given a unique ID.

```
> curl --silent --request PUT --header "Authorization: Bearer $OXIDE_TOKEN" --header "Content-Type: application/json" $OXIDE_HOST/v1/vpc-firewall-rules?project=matthewsanabria&vpc=example --data @/tmp/rules.json
{"rules":[{"id":"0852c8c8-e1d4-4dd4-84a7-19946eae0313","name":"allow-https","description":"Allow HTTPS.","time_created":"2025-06-20T19:40:22.514499Z","time_modified":"2025-06-20T19:40:22.514499Z","status":"enabled","direction":"inbound","targets":[{"type":"subnet","value":"example"}],"filters":{"hosts":[{"type":"vpc","value":"example"}],"protocols":["TCP"],"ports":["443"]},"action":"allow","priority":50,"vpc_id":"b7b7b444-1c64-4cb5-82d0-f1e4404a100c"},{"id":"bac0f44f-1835-4943-9944-08e6c328bbea","name":"allow-ssh","description":"Allow SSH (new).","time_created":"2025-06-20T19:40:22.514501Z","time_modified":"2025-06-20T19:40:22.514501Z","status":"enabled","direction":"inbound","targets":[{"type":"subnet","value":"example"}],"filters":{"hosts":[{"type":"vpc","value":"example"}],"protocols":["TCP"],"ports":["22"]},"action":"allow","priority":50,"vpc_id":"b7b7b444-1c64-4cb5-82d0-f1e4404a100c"}]}

> curl --silent --request GET --header "Authorization: Bearer $OXIDE_TOKEN" --header "Content-Type: application/json" $OXIDE_HOST/v1/vpc-firewall-rules?project=matthewsanabria&vpc=example | jq -r '.rules[] | pick(.id, .name)'
{
  "id": "0852c8c8-e1d4-4dd4-84a7-19946eae0313",
  "name": "allow-https"
}
{
  "id": "bac0f44f-1835-4943-9944-08e6c328bbea",
  "name": "allow-ssh"
}
```

Even if you issue another `PUT` request with the exact same request body
Oxide will still generate new IDs for the rules.

```
> curl --silent --request PUT --header "Authorization: Bearer $OXIDE_TOKEN" --header "Content-Type: application/json" $OXIDE_HOST/v1/vpc-firewall-rules?project=matthewsanabria&vpc=example --data @/tmp/rules.json
{"rules":[{"id":"74658b3c-f3f9-4e36-985c-f9259c81c98a","name":"allow-https","description":"Allow HTTPS.","time_created":"2025-06-20T19:40:47.171866Z","time_modified":"2025-06-20T19:40:47.171866Z","status":"enabled","direction":"inbound","targets":[{"type":"subnet","value":"example"}],"filters":{"hosts":[{"type":"vpc","value":"example"}],"protocols":["TCP"],"ports":["443"]},"action":"allow","priority":50,"vpc_id":"b7b7b444-1c64-4cb5-82d0-f1e4404a100c"},{"id":"91b353f1-8025-4c89-9d28-760c2c5719dd","name":"allow-ssh","description":"Allow SSH (new).","time_created":"2025-06-20T19:40:47.171868Z","time_modified":"2025-06-20T19:40:47.171868Z","status":"enabled","direction":"inbound","targets":[{"type":"subnet","value":"example"}],"filters":{"hosts":[{"type":"vpc","value":"example"}],"protocols":["TCP"],"ports":["22"]},"action":"allow","priority":50,"vpc_id":"b7b7b444-1c64-4cb5-82d0-f1e4404a100c"}]}

> curl --silent --request GET --header "Authorization: Bearer $OXIDE_TOKEN" --header "Content-Type: application/json" $OXIDE_HOST/v1/vpc-firewall-rules?project=matthewsanabria&vpc=example | jq -r '.rules[] | pick(.id, .name)'
{
  "id": "74658b3c-f3f9-4e36-985c-f9259c81c98a",
  "name": "allow-https"
}
{
  "id": "91b353f1-8025-4c89-9d28-760c2c5719dd",
  "name": "allow-ssh"
}
```

Let's walk through why this is a problem for Terraform.

I used Terraform to create two VPC firewall rules as shown below.

```
> jq -r '.resources[1].instances[].attributes.rules[] | pick(.id, .name)' terraform.tfstate
{
  "id": "12bbb9ca-3360-4f9f-ad5a-ada3cbb736c4",
  "name": "allow-https"
}
{
  "id": "5d83771f-76fd-421f-a785-3726b6310389",
  "name": "allow-ssh"
}
```

I then updated the port for the `allow-ssh` rule from 22
to 23. The Terraform plan shows that the resource will be
updated in place by deleting the `allow-ssh` rule with ID
`5d83771f-76fd-421f-a785-3726b6310389` and creating a new one. It will
not touch the `allow-https` rule at all.

```
> terraform apply
oxide_vpc.example: Refreshing state... [id=b7b7b444-1c64-4cb5-82d0-f1e4404a100c]
oxide_vpc_firewall_rules.example: Refreshing state... [id=c75d1e61-9737-4abd-be16-8837920e2344]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # oxide_vpc_firewall_rules.example will be updated in-place
  ~ resource "oxide_vpc_firewall_rules" "example" {
      ~ id     = "c75d1e61-9737-4abd-be16-8837920e2344" -> (known after apply)
      ~ rules  = [
          - {
              - action        = "allow" -> null
              - description   = "Allow SSH." -> null
              - direction     = "inbound" -> null
              - filters       = {
                  - hosts     = [
                      - {
                          - type  = "vpc" -> null
                          - value = "example" -> null
                        },
                    ] -> null
                  - ports     = [
                      - "22",
                    ] -> null
                  - protocols = [
                      - "TCP",
                    ] -> null
                } -> null
              - id            = "5d83771f-76fd-421f-a785-3726b6310389" -> null
              - name          = "allow-ssh" -> null
              - priority      = 50 -> null
              - status        = "enabled" -> null
              - targets       = [
                  - {
                      - type  = "subnet" -> null
                      - value = "example" -> null
                    },
                ] -> null
              - time_created  = "2025-06-20 20:30:10.992084 +0000 UTC" -> null
              - time_modified = "2025-06-20 20:30:10.992084 +0000 UTC" -> null
            },
          + {
              + action        = "allow"
              + description   = "Allow SSH."
              + direction     = "inbound"
              + filters       = {
                  + hosts     = [
                      + {
                          + type  = "vpc"
                          + value = "example"
                        },
                    ]
                  + ports     = [
                      + "23",
                    ]
                  + protocols = [
                      + "TCP",
                    ]
                }
              + id            = (known after apply)
              + name          = "allow-ssh"
              + priority      = 50
              + status        = "enabled"
              + targets       = [
                  + {
                      + type  = "subnet"
                      + value = "example"
                    },
                ]
              + time_created  = (known after apply)
              + time_modified = (known after apply)
            },
            # (1 unchanged element hidden)
        ]
        # (1 unchanged attribute hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.
```

Applying this plan causes the error noted in this issue. Looking closer
at the error text we see Terraform is referring to the `allow-https`
rule with ID `12bbb9ca-3360-4f9f-ad5a-ada3cbb736c4`, not the `allow-ssh`
rule.

```
Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

oxide_vpc_firewall_rules.example: Modifying... [id=c75d1e61-9737-4abd-be16-8837920e2344]
╷
│ Error: Provider produced inconsistent result after apply
│
│ When applying changes to oxide_vpc_firewall_rules.example, provider "provider[\"registry.terraform.io/oxidecomputer/oxide\"]"
│ produced an unexpected new value: .rules: planned set element
│ cty.ObjectVal(map[string]cty.Value{"action":cty.StringVal("allow"), "description":cty.StringVal("Allow HTTPS."),
│ "direction":cty.StringVal("inbound"),
│ "filters":cty.ObjectVal(map[string]cty.Value{"hosts":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"type":cty.StringVal("vpc"),
│ "value":cty.StringVal("example")})}), "ports":cty.SetVal([]cty.Value{cty.StringVal("443")}),
│ "protocols":cty.SetVal([]cty.Value{cty.StringVal("TCP")})}), "id":cty.StringVal("12bbb9ca-3360-4f9f-ad5a-ada3cbb736c4"),
│ "name":cty.StringVal("allow-https"), "priority":cty.NumberIntVal(50), "status":cty.StringVal("enabled"),
│ "targets":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"type":cty.StringVal("subnet"),
│ "value":cty.StringVal("example")})}), "time_created":cty.StringVal("2025-06-20 20:07:00.207241 +0000 UTC"),
│ "time_modified":cty.StringVal("2025-06-20 20:07:00.207241 +0000 UTC")}) does not correlate with any element in actual.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
```

Why does Terraform care about the `allow-https` rule here at all if the
plan was never meant to touch it?

It's because before the plan Terraform had the following in its state.

```
> jq -r '.resources[1].instances[].attributes.rules[] | pick(.id, .name)' terraform.tfstate
{
  "id": "12bbb9ca-3360-4f9f-ad5a-ada3cbb736c4",
  "name": "allow-https"
}
{
  "id": "9c723c19-16ce-489e-9c6d-23daf2ebeb1c",
  "name": "allow-ssh"
}
```

Terraform expected after the plan was applied the state would look like
this. The `allow-https` rule would remain untouched and the `allow-ssh`
rule would be completely new with a new ID.

```
> jq -r '.resources[1].instances[].attributes.rules[] | pick(.id, .name)' terraform.tfstate
{
  "id": "12bbb9ca-3360-4f9f-ad5a-ada3cbb736c4",
  "name": "allow-https"
}
{
  "id": "935e19c0-9036-419b-b615-d68665aed048",
  "name": "allow-ssh"
}
```

Instead, the Oxide API generated new IDs for all the rules, so Terraform
got back the following state.

```
> jq -r '.resources[1].instances[].attributes.rules[] | pick(.id, .name)' terraform.tfstate
{
  "id": "32f2e026-f258-4512-bd68-79b613a8b5de",
  "name": "allow-https"
}
{
  "id": "935e19c0-9036-419b-b615-d68665aed048",
  "name": "allow-ssh"
}
```

Now Terraform was left wondering what happened to the original
`allow-https` rule with ID `12bbb9ca-3360-4f9f-ad5a-ada3cbb736c4`. Since
Terraform couldn't find that original rule from the plan in the new
state it returned an error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support in-place update for oxide_vpc_firewall_rules

2 participants