Skip to content

Re-enable netproxy monitoring + pin sidecar build version (#214)#365

Open
Svaag wants to merge 1 commit into
mainfrom
feat/netproxy-monitoring-restore
Open

Re-enable netproxy monitoring + pin sidecar build version (#214)#365
Svaag wants to merge 1 commit into
mainfrom
feat/netproxy-monitoring-restore

Conversation

@Svaag

@Svaag Svaag commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Part of the x402 Hyrule Network Proxy live rollout (#214). Reverses the temporary monitoring disable from #288 and stamps the deployed binary with its pinned version.

Changes

  • Restore the two Prometheus scrape targets Fix live monitoring scrape coverage #288 removed in configs/mon/prometheus.yml:
    • netproxy node_exporter ([2a0c:b641:b50:2::e0]:9100) in the node-infra job.
    • the hyrule-network-proxy sidecar metrics job ([...:e0]:8451).
      The existing HyruleNetworkProxyDown tripwire (prometheus-rules/noc-tripwire.yml) re-activates automatically once the job's up series exists — no rule change needed.
  • Version stamping: the hyrule_network_proxy role now builds with -ldflags "-X …/internal/version.Version={{ hyrule_network_proxy_version }}", so /v1/health and startup logs report the deployed SHA (matches the sidecar CI build in Finish sidecar polish and wire CI auto-promotion (#214) hyrule-network-proxy#1).
  • Docs: docs/network-flows.md no longer says netproxy is "not live"; scrape targets are enabled and go green once the VM is up.

⚠️ Rollout ordering

Apply the Prometheus change to mon only after the netproxy VM is reachable and the sidecar is up (rollout step 5), otherwise HyruleNetworkProxyDown fires on an absent target. configs/mon/prometheus.yml is not auto-applied (manual systemctl reload prometheus / mon apply per repo convention), so merging this is inert until that step. Recommend merging as part of the live rollout, not ahead of it.

Validation

  • tests/iac: 84 passed, 190 subtests passed.
  • scripts/ci/deploy-preflight.sh --repo-only: pass.
  • ansible-playbook playbooks/network-proxy.yml --syntax-check: pass.
  • prometheus.yml parses; hyrule-network-proxy job present (13 jobs total).

🤖 Generated with Claude Code

Part of the x402 Hyrule Network Proxy live rollout (#214). Reverses the
temporary monitoring disable from #288 now that the rollout is proceeding,
and stamps the deployed binary with its pinned version.

- prometheus: restore the two scrape targets #288 removed — the netproxy
  node_exporter target ([2a0c:b641:b50:2::e0]:9100) in the node-infra job and
  the hyrule-network-proxy sidecar metrics job ([...:e0]:8451). The existing
  HyruleNetworkProxyDown tripwire re-activates once the job's up series exists.
- role: build the sidecar with -ldflags injecting hyrule_network_proxy_version
  so /v1/health and startup logs report the deployed SHA (matches the CI build).
- docs/network-flows: netproxy is no longer "not live"; scrape targets are
  enabled and go green once the VM is provisioned and the sidecar applied.

Rollout ordering: apply this to mon only AFTER the netproxy VM is reachable and
the sidecar is up, otherwise HyruleNetworkProxyDown fires on an absent target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Svaag Svaag requested a review from a team as a code owner July 4, 2026 20:38
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

214 - Partially compliant

Compliant requirements:

  • Re-enable Prometheus scrape targets for netproxy node_exporter and hyrule-network-proxy sidecar metrics (part of smoke tests monitoring checks)
  • Add version stamping to the sidecar binary via -ldflags (part of deploy sidecar – binary build)
  • Update docs/network-flows.md to reflect that netproxy is live and monitoring targets are enabled

Non-compliant requirements:

  • Seed Vault tokens (no changes in this PR)
  • Provision the netproxy VM (no VM creation code)
  • Full sidecar deployment (only version stamping; no service install, env rendering, or health verification in this PR)
  • Deploy Hyrule Cloud promotion (not addressed)
  • Full smoke tests execution (not covered)

Requires further human verification:

  • The rollout ordering warning in the PR description must be followed: this Prometheus config change is safe to merge only after the netproxy VM is up and sidecar health passes; otherwise HyruleNetworkProxyDown will fire. Human must verify netproxy is live before applying on mon.

288 - Partially compliant

Compliant requirements:

  • Re-enable netproxy Prometheus scrape targets (reverses the disable from Fix live monitoring scrape coverage #288)
  • Update docs/network-flows.md to reflect netproxy monitoring is enabled (refreshes network-flow docs)

Non-compliant requirements:

  • Add loop VM node_exporter target (not addressed)
  • Correct Vault blackbox probe (not addressed)
  • Expose Vector internal metrics (not addressed)
  • Refresh generated log Vector artifact (not addressed)
⏱️ Estimated effort to review: 1 🔵⚪⚪⚪⚪
🏅 Score: 85
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Operational Footgun: Premature Target Activation

Adding the netproxy node_exporter and hyrule-network-proxy scrape targets will cause the HyruleNetworkProxyDown alert to fire immediately if this config is applied before the netproxy VM is provisioned and the sidecar is healthy. The PR description acknowledges the risk but does not include a conditional or safeguard in the code. If someone merges and applies to mon out of order (e.g., during a batch deploy), the alert will fire until the VM is live. Use a separate commit or a guard in the deploy workflow to prevent this.

- "[2a0c:b641:b50:2::e0]:9100"   # netproxy
Missing Idempotency in Build Step

The build command uses ansible.builtin.command without a creates: or changed_when: check. If the source checkout changes (e.g., a new ref) but the binary already exists with the old version, the build will not re-execute because Ansible considers the task changed only if the git checkout changed before it? Actually the git task's notify triggers restart, but the build task itself has no idempotency guard. Subsequent runs after the binary exists will skip the git checkout if the version tag hasn't changed (update: true handles that), but if the binary is deleted or the build is re-run, the task always reports changed, and the notify fires each time. This can cause unnecessary sidecar restarts during no-op runs. Consider adding a changed_when condition based on the binary's version string or using a checksum comparison.

- name: Build hyrule-network-proxy binary
  ansible.builtin.command:
    cmd: >-
      go build
      -ldflags "-X github.com/AS215932/hyrule-network-proxy/internal/version.Version={{ hyrule_network_proxy_version }}"
      -o {{ hyrule_network_proxy_bin_path }} ./cmd/hyrule-network-proxy

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Sanitize version string to prevent injection

The version variable hyrule_network_proxy_version is interpolated directly into the
shell command without sanitization. If this variable contains spaces, quotes, or
shell metacharacters, it could break the build or lead to command injection. Use
Ansible's | quote filter to safely escape the value.

ansible/roles/hyrule_network_proxy/tasks/apply.yml [54-57]

 cmd: >-
   go build
-  -ldflags "-X github.com/AS215932/hyrule-network-proxy/internal/version.Version={{ hyrule_network_proxy_version }}"
+  -ldflags "-X github.com/AS215932/hyrule-network-proxy/internal/version.Version={{ hyrule_network_proxy_version | quote }}"
   -o {{ hyrule_network_proxy_bin_path }} ./cmd/hyrule-network-proxy
Suggestion importance[1-10]: 8

__

Why: Interpolating hyrule_network_proxy_version directly into the ldflags without escaping could allow shell injection if the variable contains special characters. Applying the | quote filter is a valid security hardening step.

Medium

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 203da3ee89

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- "[2a0c:b641:b50:2::b0]:9100" # log
- "[2a0c:b641:b50:2::c0]:9100" # vault
- "[2a0c:b641:b50:2::d0]:9100" # ci (privileged runner)
- "[2a0c:b641:b50:2::e0]:9100" # netproxy

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Install node_exporter before scraping netproxy

When the rollout follows the documented network-proxy apply path, this target is not guaranteed to come up: ansible/playbooks/network-proxy.yml only runs the hyrule_network_proxy role, and that role's package list installs the sidecar dependencies but not prometheus-node-exporter. Re-enabling this scrape therefore makes NodeExporterDown fire for netproxy unless a separate monitoring apply has already run; either install/enable node_exporter in this rollout or make that prerequisite part of the automated ordering.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant