Skip to content

[tmpnet] Enure node config is saved on restart for both runtimes #4015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 16, 2025

Conversation

maru-ava
Copy link
Contributor

@maru-ava maru-ava commented Jun 15, 2025

Why this should be merged

The addition of the kube node runtime added a Restart method to the NodeRuntime interface to support runtime-specific restart behavior. This introduced a disparity in restart behavior between the process and kube runtimes. The kube runtime was saving the node configuration before restarting, but the process runtime was not.

When starting a network configured with one or more subnets using the process runtime, this resulted in track-subnets for the bootstrap node not being serialized to disk. Only the bootstrap node was affected because only it was being restarted after having track-subnets set. The remaining nodes were started with Network.StartNode - which ensures serialization - after track-subnets was set.

This broke the xsvm and subnet-evm antithesis testing because both depend on the serialized state resulting from the process runtime initializing a target network, and the serialized bootstrap node config would end up without a track-subnets value. This resulted in 404 responses when trying to query a subnet's rest endpoint.

e2e subnet tests for xsvm were unaffected because those tests only target running nodes for which track-subnets is set correctly without regard to whether node configuration is serialized. The issue would only have surfaced in the e2e suite if someone had tried to use the --restart-network flag (e.g. to restart the nodes to use an updated avalanchego and/or xsvm plugin binary).

The fix

  • Ensure that Node.Restart serializes node configuration regardless of runtime.
  • Update the e2e-existing CI job to test against a restarted network

Replaces #4013

How this was tested

  • Manual inspection of network configuration
  • CI (with updated e2e-existing job)
    • Also verified that adding restart to e2e-existing failed against master due to the bootstrap node misconfiguration
  • Manual trigger of xsvm test setup

Need to be documented in RELEASES.md?

N/A

The addition of the kube node runtime added a `Restart` method to the
`NodeRuntime` interface to support runtime-specific restart
behavior. This introduced a disparity in restart behavior between the
process and kube runtimes. The kube runtime was saving the node
configuration before restarting, but the process runtime was not.

When starting a network configured with one or more subnets using the
process runtime, this resulted in `track-subnets` for the bootstrap
node not being serialized to disk. Only the bootstrap node was
affected because only it was being restarted after having
`track-subnets` set. The remaining nodes were started with
`Network.StartNode` - which ensures serialization - after
`track-subnets` was set.

This broke the xsvm and subnet-evm antithesis testing because both
depend on the process runtime to initialize a target network and the
bootstrap node would end up without a `track-subnets` value. This
resulted in 404 responses when trying to query a subnet's rest
endpoint.

The fix is to ensure that `Node.Restart` serializes node configuration
regardless of runtime.
@maru-ava maru-ava self-assigned this Jun 15, 2025
@Copilot Copilot AI review requested due to automatic review settings June 15, 2025 13:58
@maru-ava maru-ava added the testing This primarily focuses on testing label Jun 15, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR unifies restart behavior across runtimes by ensuring node configuration is always persisted before a restart and removes duplicate persistence in the Kubernetes runtime.

  • Adds a call to n.Write() in the generic Node.Restart to save config for all runtimes
  • Removes the redundant p.node.Write() in KubeRuntime.Restart
  • Guarantees that track-subnets and other settings are serialized when restarting a process runtime node

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tests/fixture/tmpnet/node.go Added n.Write() call in Node.Restart to persist config prior to restart
tests/fixture/tmpnet/kube_runtime.go Removed redundant p.node.Write() block in KubeRuntime.Restart
Comments suppressed due to low confidence (1)

tests/fixture/tmpnet/node.go:191

  • Add a test to verify that calling Restart on a process runtime actually persists the node configuration and retains track-subnets after restart.
func (n *Node) Restart(ctx context.Context) error {

@maru-ava maru-ava moved this to Ready 🚦 in avalanchego Jun 15, 2025
@@ -545,7 +545,7 @@ func (n *Network) Restart(ctx context.Context) error {
if err := restartNodes(ctx, nodes); err != nil {
return err
}
return WaitForHealthyNodes(ctx, n.log, n.Nodes)
return WaitForHealthyNodes(ctx, n.log, nodes)
Copy link
Contributor Author

@maru-ava maru-ava Jun 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was failing when n.Nodes included a stopped bootstrap test node

@maru-ava maru-ava force-pushed the tmpnet-save-on-restart branch from 7e2d0be to 80a7c57 Compare June 15, 2025 15:00
@maru-ava maru-ava force-pushed the tmpnet-save-on-restart branch from 80a7c57 to 889c198 Compare June 15, 2025 15:15
@StephenButtolph StephenButtolph added this pull request to the merge queue Jun 16, 2025
Merged via the queue into master with commit 5a5dbaa Jun 16, 2025
28 checks passed
@StephenButtolph StephenButtolph deleted the tmpnet-save-on-restart branch June 16, 2025 13:32
@github-project-automation github-project-automation bot moved this from Ready 🚦 to Done 🎉 in avalanchego Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing This primarily focuses on testing
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants