feat: add AS215932 extmon and diagnostics ops support#213
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a5d7daec4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| resolve_public(parsed.hostname or "") | ||
| req = urllib.request.Request(url, headers={"User-Agent": "AS215932-extmon-diag/1.0"}) | ||
| with urllib.request.urlopen(req, timeout=timeout) as resp: |
There was a problem hiding this comment.
Block redirects to private addresses
When a diagnostics request probes a public URL that returns a 30x to loopback/link-local/private infrastructure (for example metadata or 127.0.0.1), resolve_public only validates the original hostname, but urllib.request.urlopen follows redirects and resolves the redirected target itself. That bypasses the non-public-address guard and can return a body sample from internal services; disable redirects or re-validate each redirect target before fetching.
Useful? React with 👍 / 👎.
| summary: 'BGP data source stale: {% raw %}{{ $labels.source }}{% endraw %}' | ||
|
|
||
| - alert: BGPalerterCriticalEvent | ||
| expr: increase(bgpalerter_alerts_total{severity="critical"}[10m]) > 0 |
There was a problem hiding this comment.
Preserve the first BGPalerter event
For the first critical BGPalerter webhook after the agent starts, or for any new channel/type labelset, the counter series first appears at 1; Prometheus cannot compute a positive increase() without a prior sample, and later samples remain at 1. This misses the one-shot hijack/withdrawal notification until a second matching event increments the same series, so initialize counters at 0 or alert from an event timestamp/gauge instead.
Useful? React with 👍 / 👎.
|
|
||
| # BGP router table snapshots for paid Hyrule Cloud data products. | ||
| hyrule_mcp_bgp_snapshot_dir: "{{ hyrule_mcp_state_dir }}/bgp-snapshots" | ||
| hyrule_mcp_bgp_snapshot_ingest_url: "{{ lookup('ansible.builtin.env', 'HYRULE_BGP_INGEST_URL') | default('https://cloud.hyrule.host/v1/internal/bgp') }}" |
There was a problem hiding this comment.
Use the default ingest URL when env is unset
When HYRULE_BGP_INGEST_URL is not exported, Ansible's env lookup returns an empty string, and default(...) does not replace falsey values unless the second argument is true. The rendered bgp-router-snapshot.env therefore gets HYRULE_BGP_INGEST_URL= instead of the intended cloud endpoint, so the hourly snapshot job will not upload unless every deploy environment explicitly sets the URL.
Useful? React with 👍 / 👎.
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|
Live extmon VPS has been bootstrapped and this PR branch now points inventory at it. Provisioned host:
Validated after inventory update: Still blocked before
|
eed8734 to
e47057f
Compare
Conflict resolutions: - host_vars/api.yml: keep main's hyrule_cloud_monero_wallet_rpc_enabled: true (monero-wallet-rpc is active on api today; the branch's June disable is superseded by the #268 plan) and main's app SHA pins. - hyrule_cloud/defaults required_env_keys: keep main's list; deliberately NOT adding HYRULE_BGP_INGEST_TOKEN — the render hook requires non-empty values and bgp_ingest_token is unconfirmed in Vault. The env line ships with an empty default; add to the required list after seeding Vault. - vault_agent hyrule-cloud.env.ctmpl.j2 + configs/hyrule-cloud.env.j2: take the branch's Vault-toggleable PAYMENT_REQUIRE_NATIVE (default false, same behavior as main's #255 relax) and keep both appended blocks (branch BGP data/ingest + main customer-IPv6 provisioning). - generated/noc/nftables.conf: keep all three post-base rules (extmon webhook, proxy dashboard, loop callback) in host_vars order; verified by re-rendering (firewall.yml --tags validate --limit noc → changed=0). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 review + refresh — merging this closes a live drift hazardWhy now: this branch was deployed to extmon on 2026-06-14 (host up 2w5d with bgp-agent/bgpalerter/prometheus running) but never merged — so main still claimed Refresh: merged
Review findings:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a79c0b3d65
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| module: [http_2xx_v6] | ||
| static_configs: | ||
| - targets: ['http://{{ extmon_bgpalerter_listen }}/status'] |
There was a problem hiding this comment.
Switch BGPalerter to a non-TLS probe
The http_2xx_v6 blackbox module in blackbox.yml.j2 sets fail_if_not_ssl: true, but this new job probes BGPalerter's loopback REST endpoint over plain http://. As soon as the job is scraped, blackbox marks probe_success=0, so the generic ProbeFailing rule will continuously alert even while BGPalerter is healthy; use an HTTP-without-TLS module or scrape the REST endpoint directly.
Useful? React with 👍 / 👎.
| if not CF_TOKEN: | ||
| _set_source("cloudflare_radar", False, "missing token") | ||
| return |
There was a problem hiding this comment.
Keep the optional Cloudflare feed out of down alerts
When EXTMON_BGP_CLOUDFLARE_API_TOKEN is not set (the docs/host vars call Radar optional), this path publishes bgp_source_up{source="cloudflare_radar"}=0. The new BGPSourceDown rule alerts on any zero source, so a default deployment without the optional token will generate a permanent warning; skip emitting the source or mark it disabled instead of down.
Useful? React with 👍 / 👎.
| - job_name: extmon-bgp-agent | ||
| static_configs: | ||
| - targets: ['{{ extmon_bgp_agent_listen }}'] |
There was a problem hiding this comment.
Add an up alert for the BGP agent
Adding the BGP agent as its own scrape job leaves it outside the only extmon up alert (ExtmonComponentDown matches up{job="extmon-self"}), and repo-wide search shows no other rule covers this job. If extmon-bgp-agent crashes or stops serving /metrics, Prometheus only sets up{job="extmon-bgp-agent"}=0 while the BGP metrics go stale, so none of the BGP source/visibility alerts fire; add an up alert for this job or include it in the self-monitoring set.
Useful? React with 👍 / 👎.
- P1 diag-agent SSRF-via-redirect: add _ValidatingRedirectHandler that re-runs resolve_public() on every 30x hop, and fix resolve_public so a literal private-IP host can no longer bypass the guard (the bare `except ValueError: pass` swallowed the intended block). - P2 BGPalerterCriticalEvent missed the first event: increase() on a counter has no prior sample for a new labelset. Emit a per-severity bgpalerter_last_event_timestamp gauge and alert on recency instead. - P2 optional Cloudflare Radar feed emitted bgp_source_up=0 when token-less, making BGPSourceDown fire permanently; skip registering the source instead. - P2 blackbox-bgpalerter probed a plain-http loopback endpoint with the fail_if_not_ssl http_2xx_v6 module (always probe_success=0). Add a non-TLS http_2xx_local module and point the job at it. - P2 extmon-bgp-agent scrape job had no up alert (outside the extmon-self matcher). Add ExtmonBGPAgentDown. Templates + regenerated ansible/generated/extmon/ artifacts. py_compile + yaml load clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Remedied all Codex review findings (commit c77a33e)Went through both Codex rounds. One round-1 finding (P2 ingest-URL default) was already fixed on the branch in e47057f; the other five are addressed:
Templates + regenerated The deeper diag-agent items from the automated security summary (DNS-rebinding pin, arg-injection hardening, empty-token default, env-file modes, CAIDA |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c77a33e98e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| name: bgp-router-snapshot.timer | ||
| state: started | ||
| enabled: true | ||
| daemon_reload: true |
There was a problem hiding this comment.
Enforce retention before enabling snapshot timer
On noc applies where the hyrule-mcp checkout exists, this starts the hourly bgp-router-snapshot.timer writing under /var/lib/hyrule-mcp/bgp-snapshots, but the existing docs/runbooks/noc-bgp-snapshot-retention.md says this timer must not be re-enabled until retention is managed because the same snapshots previously grew to ~14 GiB and filled /. I checked this change for a tmpfiles/cleanup policy and only see the directory/unit/timer being added, so this reintroduces the documented disk-fill failure mode; add retention before enabling the timer or leave it disabled.
Useful? React with 👍 / 👎.
| PREFIXES = json.loads(os.environ.get("EXTMON_BGP_PREFIXES", '{{ extmon_bgp_prefixes | to_json }}')) | ||
| CF_TOKEN = os.environ.get("EXTMON_BGP_CLOUDFLARE_API_TOKEN", "") | ||
| BGPTOOLS_UA = os.environ.get("EXTMON_BGP_BGPTOOLS_USER_AGENT", "AS215932-bgp-observer/1.0") | ||
| POLL_SECONDS = int(os.environ.get("EXTMON_BGP_POLL_SECONDS", "300")) |
There was a problem hiding this comment.
Cache bgp.tools longer than the agent poll
With the default EXTMON_BGP_POLL_SECONDS unset, poll_loop() calls poll_bgp_tools() every 300 seconds and downloads the full https://bgp.tools/table.jsonl each time. The bgp.tools API docs state that this export updates only about every 30 minutes and ask clients not to download it more often than 30 minutes (https://bgp.tools/kb/api), so the normal deployment will fetch it 12 times/hour and risks throttling or blocking the bgp_tools source; give this feed its own >=30m cache/interval instead of using the 5m poll.
Useful? React with 👍 / 👎.
- P1 BGP snapshot timer re-enabled without retention (regresses incident #321, snapshots reached ~14 GiB and filled /). Add systemd-tmpfiles age-based retention (default 7d, matching snapshot expires_at and the retention runbook) and activate it BEFORE enabling bgp-router-snapshot.timer. - P2 bgp.tools table.jsonl pulled every 5m by the poll loop; their API asks for <=30m. Rate-limit successful full-table fetches to a dedicated BGPTOOLS_MIN_INTERVAL (default 1800s), gated on last success so transient failures still retry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-3 findings remedied (commit 86e2af0)Both new findings addressed:
ansible-lint passes (production profile) on the changed role; py_compile + yaml load clean; extmon render reproduces the artifacts. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86e2af08d4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| routing = _fetch_json(f"https://stat.ripe.net/data/routing-status/data.json?resource={q}") | ||
| data = routing.get("data", {}) | ||
| origins = [int(o.get("origin")) for o in data.get("origins", []) if str(o.get("origin", "")).isdigit()] | ||
| visible = bool(data.get("last_seen") or origins) |
There was a problem hiding this comment.
Use current RIPEstat origins for visibility
When the aggregate is withdrawn, RIPEstat keeps last_seen as historical ever-seen metadata while origins is the current exact-match announcer list (https://stat.ripe.net/docs/data-api/api-endpoints/routing-status). Because this treats any non-empty last_seen as visible, a prefix that was announced before but is no longer originated is exported as bgp_prefix_visible=1, so AS215932PrefixNotVisible will not fire for an AS215932 aggregate withdrawal unless another source catches it. Base visibility on the current origins/peer counts instead of last_seen.
Useful? React with 👍 / 👎.
| resolve_public(parsed.hostname or "") | ||
| req = urllib.request.Request(url, headers={"User-Agent": "AS215932-extmon-diag/1.0"}) | ||
| with _OPENER.open(req, timeout=timeout) as resp: |
There was a problem hiding this comment.
Pin HTTP probes to the vetted DNS result
For customer-controlled HTTP/HTTPS diagnostic URLs, this validates the hostname once and then passes the original URL to urlopen, which performs its own DNS lookup; a DNS-rebinding name can return a public address for resolve_public() and then 127.0.0.1 or 169.254.169.254 for the actual request, bypassing the non-public-target guard and reaching extmon loopback/metadata services. The request needs to connect to the vetted address or revalidate the address used by the connection.
Useful? React with 👍 / 👎.
| def blocked_ip(addr: str) -> bool: | ||
| ip = ipaddress.ip_address(addr) | ||
| return ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_multicast or ip.is_reserved or ip.is_unspecified |
There was a problem hiding this comment.
Block non-global CGNAT targets
On hosts where provider/internal services are reachable via CGNAT or shared address space, 100.64.0.0/10 is neither is_private nor is_reserved in Python's ipaddress, so resolve_public() allows probes to non-public CGNAT targets despite the guard's intent to reject non-public addresses. Use not ip.is_global with explicit allow/deny exceptions, or add the shared-address ranges to the block list.
Useful? React with 👍 / 👎.
- P1 diag-agent DNS-rebinding SSRF: resolve_public() vetted the hostname but
urlopen re-resolved, so a rebinding name could return a public IP to the
check and 127.0.0.1/169.254.169.254 to the fetch. Replace urlopen with
_Pinned{HTTP,HTTPS}Connection that connect to the pre-vetted IP while keeping
the hostname for Host header + TLS SNI/cert; redirects are followed manually
so every hop is re-vetted and re-pinned.
- P1 RIPEstat visibility used `last_seen` (historical ever-seen), keeping
bgp_prefix_visible=1 after a withdrawal and silencing
AS215932PrefixNotVisible. Base visibility on the current `origins` set.
- P2 blocked_ip missed shared/CGNAT space (100.64.0.0/10 is neither is_private
nor is_reserved). Use `not is_global` as the primary guard (also covers ULA),
explicit terms kept as defence-in-depth.
Verified: py_compile clean; blocked_ip unit checks (CGNAT/private/loopback
blocked, public allowed); literal-private resolve_public raises; live pinned
https fetch returns 200.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-4 findings remedied (commit 42dc247)
Verified locally: |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42dc247685
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| extmon_bgp_cloudflare_api_token: "{{ lookup('ansible.builtin.env', 'EXTMON_BGP_CLOUDFLARE_API_TOKEN') | default('') }}" | ||
| extmon_bgp_hyrule_ingest_url: "{{ lookup('ansible.builtin.env', 'EXTMON_BGP_HYRULE_INGEST_URL') | default('https://cloud.hyrule.host/v1/internal/bgp', true) }}" | ||
| extmon_bgp_ingest_token: "{{ lookup('ansible.builtin.env', 'EXTMON_BGP_INGEST_TOKEN') | default('', true) }}" | ||
| extmon_bgp_bgp_tools_user_agent: "{{ lookup('ansible.builtin.env', 'EXTMON_BGP_BGPTOOLS_USER_AGENT') | default('AS215932-bgp-observer/1.0 (noc@as215932.net; +https://as215932.net)') }}" |
There was a problem hiding this comment.
Preserve the bgp.tools default User-Agent
When EXTMON_BGP_BGPTOOLS_USER_AGENT is unset, lookup('env', ...) returns an empty string, so this default(...) never applies without the boolean true argument; the generated env in this commit already renders EXTMON_BGP_BGPTOOLS_USER_AGENT=. The new bgp.tools poller then sends a blank User-Agent even though bgp.tools requires an identifying HTTP User-Agent and may block generic/missing ones, so the bgp_tools source can fail unless every operator sets the env var manually.
Useful? React with 👍 / 👎.
| - { src: bgp-agent.env.j2, dest: bgp-agent.env } | ||
| - { src: diag-agent.env.j2, dest: diag-agent.env } |
There was a problem hiding this comment.
Stop rendering secret extmon env files into the worktree
When an operator follows the updated extmon runbook and exports the ingest/diagnostic tokens before running the render/validate step, these two env templates are written verbatim into the git-tracked ansible/generated/extmon/*.env files with mode 0644. Since the templates contain EXTMON_BGP_INGEST_TOKEN, the Cloudflare token, and EXTMON_DIAG_AGENT_TOKEN, a normal review diff can leave bearer tokens in the worktree and make them easy to commit or leak; skip these secret env files from generated output or redact the secret values.
Useful? React with 👍 / 👎.
- P2 bgp.tools User-Agent default never applied: env lookup returns "" when unset and default() ignores falsey without the boolean arg, so the poller sent a blank UA (bgp.tools may block it). Add `, true` to the default. - P2 secret env files rendered into the worktree: bgp-agent.env / diag-agent.env carry EXTMON_BGP_INGEST_TOKEN, the Cloudflare token, and EXTMON_DIAG_AGENT_TOKEN. If an operator exported those before the render step they'd land in git-tracked generated/extmon/*.env at 0644, one `git add` from a leak. Stop rendering them to the worktree (install.yml already templates them straight to /etc/extmon/*.env at 0600), untrack the two files, and .gitignore the pattern. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-5 findings remedied (commit 2484e2a)
Re-render is now |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2484e2a051
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| q = urllib.parse.quote(prefix, safe="") | ||
| realtime = _fetch_json(f"https://api.cloudflare.com/client/v4/radar/bgp/routes/realtime?prefix={q}", headers=headers) | ||
| with STATE_LOCK: | ||
| STATE["prefixes"].setdefault(prefix, {})["cloudflare_radar"] = {"visible": bool(realtime.get("success", True)), "origins": []} |
There was a problem hiding this comment.
Parse Cloudflare routes before marking visibility
When EXTMON_BGP_CLOUDFLARE_API_TOKEN is set, this marks the prefix visible whenever the API request succeeds (and even defaults missing success to true) while recording no origins. Cloudflare Radar's realtime response carries the route/origin data in the returned route metadata, so a withdrawn prefix or MOAS/hijack can still export bgp_prefix_visible{source="cloudflare_radar"} 1 and no bgp_prefix_origin_visible series, defeating the feed that is meant to improve hijack/leak/MOAS visibility and sending misleading ingest payloads.
Useful? React with 👍 / 👎.
| with STATE_LOCK: | ||
| for prefix, row in found.items(): | ||
| STATE["prefixes"].setdefault(prefix, {})["bgp_tools"] = {"visible": True, "origins": [row["asn"]]} |
There was a problem hiding this comment.
Clear bgp.tools state when the prefix disappears
On a successful bgp.tools table download that no longer contains a monitored prefix, this only updates entries in found and leaves the previous STATE["prefixes"][prefix]["bgp_tools"] value untouched. After a withdrawal or exact-prefix removal, the last visible=True/origin remains exported indefinitely for source="bgp_tools", so the visibility alert and the Hyrule ingest payload stay wrong until the agent restarts; each successful fetch should update every wanted prefix, setting missing ones invisible and clearing stale origin/hit data.
Useful? React with 👍 / 👎.
Both feeds exported stale/optimistic visibility that would mask a withdrawal: - P2 bgp.tools left withdrawn prefixes untouched: only prefixes present in the latest table were updated, so a removed prefix kept exporting its last visible=True origin (and stale hit counters) forever. Now every wanted prefix is updated each successful fetch — missing ones flip to visible=False with empty origins, and stale per-origin hit counters are cleared. - P2 Cloudflare Radar marked visible on HTTP success alone with no origins, so a withdrawal or MOAS/hijack still showed visible=1 and emitted no bgp_prefix_origin_visible series. Now parses result.routes for visibility and origin ASNs; raises (source marked down) on an unsuccessful API response. Verified with mocked-upstream unit tests: present->visible+origins+hits, withdrawal->invisible+cleared, for both feeds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-6 findings remedied (commit eb38385)Both feeds were exporting stale/optimistic visibility that would mask a real withdrawal:
Verified with mocked-upstream unit tests for both feeds: announced → |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eb383857a5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Always return success after writing the textfile. Partial OVH API | ||
| # failures are represented by ovh_collector_errors_total and should alert | ||
| # through Prometheus; failing the oneshot leaves stale/missing metrics. | ||
| return 0 |
There was a problem hiding this comment.
Alert on OVH collector errors before marking runs successful
When any /dedicated, /ip, or /me/bill request fails, this now still writes a fresh ovh_collector_last_run_timestamp and exits 0, but the alert rules only cover staleness/expiry/unpaid metrics and never consume ovh_collector_errors_total. A credential or OVH API outage can therefore replace the textfile with missing/partial billing metrics plus a fresh timestamp, so Prometheus will not page about losing expiry coverage.
Useful? React with 👍 / 👎.
| routes = (realtime.get("result") or {}).get("routes") or [] | ||
| origins = [] | ||
| for route in routes: | ||
| asn = route.get("origin_asn", route.get("originASN")) |
There was a problem hiding this comment.
Use Cloudflare route data to extract origins
When EXTMON_BGP_CLOUDFLARE_API_TOKEN is configured, Radar realtime route objects expose the origin through meta.prefix_origins or the last element of as_path, not origin_asn/originASN, so this loop drops every Cloudflare origin while still exporting the prefix as visible. In a MOAS or hijack that only this optional Radar feed sees, AS215932UnexpectedOrigin never gets a cloudflare_radar origin series to evaluate.
Useful? React with 👍 / 👎.
- P2 OVH collector failures were unmonitored: the collector intentionally writes its textfile (fresh timestamp) even when /dedicated, /ip, or /me/bill calls fail, representing failures via ovh_collector_errors_total — but no alert consumed that gauge, so a credential/API outage silently dropped expiry coverage. Add OVHCollectorErrors (errors_total > 0 for 30m). - P2 Cloudflare origin extraction used origin_asn/originASN, but Radar realtime routes expose the origin via meta.prefix_origins or the last as_path element, so my round-6 code dropped every origin while still marking visible. Extract from all known shapes (meta.prefix_origins int|dict, as_path[-1], origin_asn fallback). Verified: alerts.yml loads and includes OVHCollectorErrors; unit tests cover all four cloudflare origin shapes plus a two-origin MOAS. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-7 findings remedied (commit 8a3e288)
Verified: |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8a3e288810
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| enableExpirationAlerts: true | ||
| enableExpirationCheckTA: true |
There was a problem hiding this comment.
Disable ROA expiry checks with Routinator VRPs
When relying on extmon for ROA-expiration coverage, these checks are wired to the rpki provider below, which is Routinator's /json API; that JSON feed does not include the per-VRP expiration field that BGPalerter's monitorROAS expiry and TA-expiry checks require. The service will therefore appear to provide ROA-expiry alerting but will not emit those alerts from this feed; either use a provider such as rpkiclient that supplies expiry data or turn the expiry checks off to avoid false coverage.
Useful? React with 👍 / 👎.
- P2 BGPalerter monitorROAS advertised ROA/TA-expiry coverage it could not deliver: enableExpirationAlerts/enableExpirationCheckTA need a per-VRP `expires` field, but the configured rpki provider (Routinator /json) does not emit it, so those alerts would never fire. Disable both expiry checks (and the now-moot roaExpirationAlertHours) to avoid false coverage; keep enableDiffAlerts and enableDeletedCheckTA, which do not need expiry data. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-8 finding remedied (commit 1ea1b76)
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1ea1b76069
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| isTemplateJSON: true | ||
| templates: | ||
| default: '{"source":"bgpalerter","channel":"${channel}","type":"${type}","summary":"${summary}","prefix":"${prefix}","asn":"${asn}","newprefix":"${newprefix}","neworigin":"${neworigin}","earliest":"${earliest}","latest":"${latest}"}' | ||
| showPaths: 5 |
There was a problem hiding this comment.
Keep BGPalerter JSON webhooks parseable
When BGPalerter emits a path-bearing hijack/visibility/RPKI alert, this combination makes the webhook body invalid before it reaches extmon-bgp-agent: in BGPalerter v2.0.1 reportHTTP appends the top paths to context.summary when showPaths > 0 and then, with isTemplateJSON, runs JSON.parse(blob) before posting. Since ${summary} is interpolated inside a JSON string here, those alerts can be dropped by the reporter instead of updating bgpalerter_last_event_timestamp, so the realtime critical BGPalerter alert path is lost; set showPaths: 0 for the JSON webhook or emit paths in a safely escaped field.
Useful? React with 👍 / 👎.
- P2 BGPalerter JSON webhook could drop critical alerts: with isTemplateJSON,
showPaths>0 makes BGPalerter append the top paths into ${summary} and then
JSON.parse() the interpolated body. The unescaped path blob breaks JSON, so a
path-bearing hijack/visibility/RPKI alert would be dropped before reaching
extmon-bgp-agent -- never updating bgpalerter_last_event_timestamp, losing
the realtime critical-event path. Set showPaths: 0 (the agent only consumes
channel/type/severity; paths are unused).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-9 finding remedied (commit dfb45a3)
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dfb45a36f8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| rpki_max_length: 48 | ||
| extmon_bgp_expected_adjacent_asns: [34872, 210233, 58057, 56755] | ||
| extmon_bgp_agent_listen: "127.0.0.1:9188" | ||
| extmon_diag_agent_listen: "127.0.0.1:9190" |
There was a problem hiding this comment.
Expose a reachable path to the diagnostic agent
When the Hyrule Cloud diagnostics API tries to use this new extmon probe service from the api/cloud host, this default binds it only to extmon loopback, and the extmon UFW role still denies all inbound traffic except SSH, with no tunnel or reverse proxy configured for port 9190. That means /tcp, /http, /ping, etc. will only work from the extmon host itself; add an explicit reachable listener/firewall allowance from the intended caller or configure the tunnel/proxy the cloud service should use.
Useful? React with 👍 / 👎.
- P2 diagnostic agent had no reachable path for its intended caller: it binds loopback and UFW denies all inbound but SSH, so the Hyrule Cloud diagnostics API could not reach it. Rather than expose an unhardened active-probe service (its input-validation hardening is gated by #361), make the access path explicit and opt-in: add extmon_diag_agent_allowed_sources (default []), a UFW rule that opens the agent port to just those callers (no-op while empty), and document the SSH-port-forward default + the direct-exposure procedure. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-10 finding remedied (commit 961c245)
ansible-lint passes (production profile) on the changed role files. |
|
@codex review |
|
Codex Review: Didn't find any major issues. Swish! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86e77751ba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| reports: | ||
| - file: reportFile | ||
| channels: [hijack, newprefix, visibility, path, misconfiguration, rpki, roa] |
There was a problem hiding this comment.
Add a monitor for same-AS more-specifics
This subscribes the reports to newprefix, but the config never declares monitorNewPrefix, so nothing produces that channel. BGPalerter's docs distinguish this from monitorAS: monitorAS ignores prefixes already covered by a configured aggregate, while monitorNewPrefix is the monitor that catches the expected AS announcing a new more-specific (https://github.com/nttgin/BGPalerter/blob/main/docs/configuration.md#monitornewprefix). With only the /44 aggregate configured, a stray AS215932 /48 more-specific can therefore avoid the intended new-prefix alert; add monitorNewPrefix if same-AS more-specific leaks should page.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b5e9138aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| row = json.loads(line) | ||
| cidr = row.get("CIDR") | ||
| if cidr in wanted: | ||
| found[cidr] = {"asn": int(row.get("ASN", 0)), "hits": int(row.get("Hits", 0))} |
There was a problem hiding this comment.
Preserve all bgp.tools origins
When bgp.tools exports multiple rows for the same CIDR (MOAS/hijack cases are documented at https://bgp.tools/kb/more-than-one-asn-per-prefix), this assignment keeps only the last row for that prefix. If the AS215932 row happens to be last, the bgp_tools metrics and ingest payload contain only origin 215932, so AS215932UnexpectedOrigin will not fire from this feed even though bgp.tools saw another origin; accumulate all ASNs/hit counts per CIDR instead of replacing earlier rows.
Useful? React with 👍 / 👎.
| - file: monitorPath | ||
| channel: path | ||
| name: path-matching | ||
| params: | ||
| thresholdMinPeers: 1 |
There was a problem hiding this comment.
Wire path rules before enabling path alerts
This enables monitorPath, but the rendered prefixes.yml contains no per-prefix path: rules, and the newly added extmon_bgp_expected_adjacent_asns is never rendered anywhere. BGPalerter's monitorPath only evaluates AS_PATH regex rules declared on prefixes (https://github.com/nttgin/BGPalerter/blob/main/docs/configuration.md#monitorpath), so in the advertised path-alerting setup an unexpected upstream/downstream AS path will never produce a BGPalerter event.
Useful? React with 👍 / 👎.
Summary
This PR contains the Network Operations side of the AS215932 / Agentic ISP Support rollout.
It adds operational support for the new Hyrule Cloud BGP/network-intelligence platform and exposes pricing knobs for the new agentic diagnostics APIs.
Major pieces:
extmon-bgp-agentextmon-diag-agentv2.0.1with SHA256Review notes
This is paired with:
hyrule-cloudPR: x402 Agentic ISP Support API/Skill implementationhyrule-mcpPR: AS215932 router snapshot collectorThe important design constraint remains:
extmonis outside AS215932 and must not hold router SSH keys. Router snapshots stay NOC-side.Validation
After rebasing on current
origin/main:Previous local validation before the branch split also covered extmon/noc/cloud syntax checks and extmon render/agent py_compile.