Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul Snapshot State Command Feature #20831

Closed
wants to merge 10 commits into from

Conversation

natemollica-nm
Copy link
Contributor

@natemollica-nm natemollica-nm commented Mar 11, 2024

Description

Consul Snapshot backup inspection is somewhat limited for in-depth troubleshooting scenarios. This command addition provides users a more human-readable format to inspect Consul's raft state from a snapshot backup.

This PR introduces a raftutil concept that mirrors that of our Nomad product where a temporary "dummy" FSM is spawned, the snapshot archive is read into this, and subsequently redirected to stdout for usability.

Inspired by several extended troubleshooting efforts surrounding WAN Federation replication, an Enterprise Namespace issues. This command allows more detailed snapshot backup file inspection for:

  • Nodes
  • Coordinates
  • Services (Parsed by Node Services)
  • Gateway Services
  • Service Intentions
  • ACL Tokens
  • ACL Roles
  • ACL Policies
  • ACL Authentication Methods
  • ACL Binding Rules
  • Key-Value Store Values
  • Configuration Entries
  • Connect CA Configuration
  • Connect CA Root Certificates
  • Connect CA Leaf Certificates

Testing & Reproduction steps

  • Built consul binary dev-build and ran local agent configuration with some test KV, namespace, and service data
    • Reproduction setup here
  • Performed snapshot backup on tested agent state: consul snapshot save backup.snap
  • Ran consul snapshot state backup.snap on backup and observed JSON interpreted state from backup to be usable.
{
  "ACLAuthMethods": null,
  "ACLBindingRules": null,
  "ACLPolicies": null,
  "ACLRoles": null,
  "ACLTokens": null,
  "ConfigEntries": [
    {
      "Kind": "proxy-defaults",
      "Name": "global",
      "Config": null,
      "TransparentProxy": {},
      "MeshGateway": {
        "Mode": "local"
      },
      "Expose": {},
      "AccessLogs": {},
      "Hash": 797712258995813288,
      "CreateIndex": 13,
      "ModifyIndex": 13
    },
  .... (cut for brevity, but contains other schema table state returns) ...
  ],
  "SnapshotMeta": [
    {
      "Version": 1,
      "ID": "2-43-1710199059887",
      "Index": 43,
      "Term": 2,
      "Peers": "ka4xMjcuMC4wLjE6ODMwMA==",
      "Configuration": {
        "Servers": [
          {
            "Suffrage": 0,
            "ID": "2ee786d2-e7d3-8b39-99bf-4bec5f04d60a",
            "Address": "127.0.0.1:8300"
          }
        ]
      },
      "ConfigurationIndex": 1,
      "Size": 20412
    }
  ]
}
  • Tested filtered returns: consul snapshot state -filter "Nodes" backup.snap

JSON Filtered Return

{
  "Nodes": [
    {
      "ID": "2ee786d2-e7d3-8b39-99bf-4bec5f04d60a",
      "Node": "consul-server-dc1",
      "Address": "127.0.0.1",
      "Datacenter": "dc1",
      "TaggedAddresses": {
        "lan": "127.0.0.1",
        "lan_ipv4": "127.0.0.1",
        "wan": "127.0.0.1",
        "wan_ipv4": "127.0.0.1"
      },
      "Meta": {
        "consul-network-segment": "",
        "consul-version": "1.19.0"
      },
      "CreateIndex": 15,
      "ModifyIndex": 21
    }
  ],
  "SnapshotMeta": [
    {
      "Version": 1,
      "ID": "2-43-1710199059887",
      "Index": 43,
      "Term": 2,
      "Peers": "ka4xMjcuMC4wLjE6ODMwMA==",
      "Configuration": {
        "Servers": [
          {
            "Suffrage": 0,
            "ID": "2ee786d2-e7d3-8b39-99bf-4bec5f04d60a",
            "Address": "127.0.0.1:8300"
          }
        ]
      },
      "ConfigurationIndex": 1,
      "Size": 20412
    }
  ]
}

Links

Nomad Equivalent Command: nomad operator snapshot state <file>

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

@natemollica-nm natemollica-nm self-assigned this Mar 11, 2024
@github-actions github-actions bot added theme/cli Flags and documentation for the CLI interface pr/dependencies PR specifically updates dependencies of project labels Mar 11, 2024
@natemollica-nm
Copy link
Contributor Author

@mkeeler just saw you pushed your Snapshot Decode PR earlier today. Didn't know something similar was in the works - would this PR (pending anything deemed incorrect and needing updates) still be something of value or should I close this one out?

@mkeeler
Copy link
Member

mkeeler commented Mar 12, 2024

@natemollica-nm TBH I didn't know this PR existed but did know that there have been many times we have wanted to ask users about data in a snapshot and have been unable to do so and so I created my PR.

There are three reasons why I think I would prefer my approach to the one taken in this PR:

  1. The implementation is heavily based around data streaming which has a few benefits. First, the full state of snapshot data is never in the snapshot decoding tools memory at once allowing for decoding very large snapshots on systems that might not have enough memory to hold all at once. Lastly, outputting newline delimited JSON objects independently instead of one large JSON object with more subkeys means that an other tool that the output is piped too can immediately start processing objects from the stream without needing to wait for all output to have finished. This means that those tools also do not need to have high memory requirements and that we can better utilize the CPU across the two coordinating processes to more quickly get at the results we are looking for.
  2. It is quite a bit less code to maintain.
  3. It is generically handling all the data types and therefore can be more broadly forward and backwards compatible including cross-compatibility with Consul Enterprise.

@natemollica-nm
Copy link
Contributor Author

@mkeeler Awesome! Thanks for detailing the engineering insight behind your PR 😀

@natemollica-nm natemollica-nm deleted the natemollica-nm/snapshot-state-cmd branch March 12, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/dependencies PR specifically updates dependencies of project theme/cli Flags and documentation for the CLI interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants