feat(auth): Phase 2 — AWS Sigv4, GCP IAP, Azure AD providers (v0.11.0) by naveen-kurra · Pull Request #79 · initializ/forge

naveen-kurra · 2026-05-24T00:16:07Z

Summary

Phase 2 of the pluggable auth provider work — three cloud-native providers
on top of the Phase 1 foundation (#77 / 7998f12). Customers authenticate
to Forge using identities they already have in their cloud; no parallel
IdP required.

Provider	What it does	Client token format
`aws_sigv4`	Caller mints a pre-signed STS `GetCallerIdentity` URL with their AWS SDK; Forge invokes it. STS returns the caller's canonical ARN. Same pattern as `aws-iam-authenticator` (EKS).	`Authorization: Bearer forge-aws-v1.<base64-of-presigned-sts-url>`
`gcp_iap`	Verify the JWT GCP IAP forwards as `X-Goog-Iap-Jwt-Assertion` when Forge sits behind a GCP HTTPS LB + IAP.	`X-Goog-Iap-Jwt-Assertion: <jwt>`
`azure_ad`	Verify Microsoft Entra ID Bearer tokens with tenant lock-in and optional Microsoft Graph group enrichment. Composes the Phase 1 `oidc` provider.	`Authorization: Bearer <aad-jwt>`

Forge never holds any IdP secrets — all three providers verify a caller-
minted credential against a third party (STS / GCP JWKS / AAD JWKS).

Why this matters

Today, putting Forge behind any of the three big cloud IdPs requires
standing up a parallel OIDC issuer (Cognito for AWS, Workspace SAML, etc.).
This PR removes that friction:

AWS: any Lambda / EC2 / EKS / IAM user calls Forge using their
existing IAM credentials. Zero secrets stored on Forge, no token endpoint
to host.
GCP: Forge sitting behind GCP LB+IAP consumes IAP's signed user
assertion directly.
Azure: Entra tokens (humans, CI, Azure workloads) work natively with
AAD-specific quirks (tenant gate, groups overage) handled correctly.

Design pivot during PR review — `aws_sigv4` switched to pre-signed URL pattern

The PR went through an in-flight design correction caught by real-AWS
smoke testing. The TL;DR:

What was wrong (original design — commits `9a1ebae` through `382294e`)

The first design had clients sign their POST to Forge using AWS Sigv4
("header reflection"). Forge would forward the signed headers to STS,
expecting STS to validate them.

This is broken in a deterministic way: Sigv4 binds its signature to
the destination host as part of the canonicalized signing input.
Headers signed for Forge's hostname can't be replayed against STS —
STS computes host: sts.<region>.amazonaws.com, recomputes the
signature, gets a different hash, and rejects with
SignatureDoesNotMatch.

Standard tools (awscurl, boto3.client('sts'), all AWS SDKs) always
sign for the URL they're calling, so there was no working client path.

What's correct (current design — commit `8568535`)

Switched to the pre-signed URL pattern that aws-iam-authenticator
uses for EKS:

Caller's AWS SDK mints a pre-signed GetCallerIdentity URL — signature
embedded in query params, signed for STS's host.
Caller base64-encodes the URL, prepends forge-aws-v1., sends as a
standard Authorization: Bearer … header.
Forge decodes, validates the URL host matches sts.<region>.amazonaws.com
(SSRF guard), GETs the URL, parses the XML response, stamps Identity.

Why this is the right design (not just a fix)

Three design properties land cleanly:

Client UX matches OIDC/JWT/Azure AD. All three Phase 2 providers
now have the same caller experience: mint a Bearer token, attach it,
send. Three lines of Python (or any AWS-SDK-bearing language). No
custom signing logic, no header manipulation, no per-call procedure.
No re-engineering of standard tools. aws-iam-authenticator-style
token format works with any AWS SDK's SigV4QueryAuth API.
Server is simpler. Forge doesn't reflect headers; it just GETs a
URL. ~80 LOC of STS client code instead of header-forwarding plumbing.

Locked design decisions (unchanged through the pivot)

§9.1 — No aws-sdk-go-v2 dependency. STS client is hand-rolled HTTP + XML.
§9.2 — azure_ad composes Phase 1 oidc.Provider; no JWT/JWKS code in azure_ad/.
§9.3 — allowed_principals uses path.Match shell globs (no regex).
§9.4 — IAP issuer + JWKS URL hardcoded; no operator-overridable knob.

What clients write (the actual 3 lines)

import boto3, base64
from botocore.auth import SigV4QueryAuth
from botocore.awsrequest import AWSRequest

creds = boto3.Session().get_credentials().get_frozen_credentials()
req = AWSRequest(method='GET',
                 url='https://sts.us-east-1.amazonaws.com/?Action=GetCallerIdentity&Version=2011-06-15')
SigV4QueryAuth(creds, 'sts', 'us-east-1', expires=900).add_auth(req)
token = 'forge-aws-v1.' + base64.urlsafe_b64encode(req.url.encode()).rstrip(b'=').decode()

requests.post(forge_url, headers={'Authorization': f'Bearer {token}'}, data=msg)

A reference client lives at scripts/forge-aws-sign.py (~80 LOC; CLI flags
for --region, --profile, --token-only, etc.).

Note on boto3: the high-level boto3.client('sts').generate_presigned_url('get_caller_identity', ...) returns a URL STS rejects with SignatureDoesNotMatch even on direct curl — known boto3 limitation (it signs as if the request were a POST). The reference client uses the lower-level SigV4QueryAuth API directly, same as aws-iam-authenticator does internally.

✅ Real-AWS validation (live AWS account)

End-to-end validated against real AWS STS using SSO assumed-role
credentials in account 412664885516. Forge instance built from
commit 8568535, running locally.

#	Scenario	Expected	Actual	Validates
1	`curl` with no auth headers	401, `missing_token`	✅ 401 + `valid bearer token required`	Phase 1 middleware enforces auth; review #4 reason-code preserved
2	Pre-signed URL token, empty allowlist	200 / handler-error, `auth_verify` fires	✅ HTTP 400 (body shape; auth path complete), `auth_verify` emitted	STS reflection works end-to-end against real STS
3	Same token, matching `allowed_principals`	200 / handler-error (auth pass)	✅ HTTP 400, `auth_verify` fires	`ArnMatcher` accepts matching ARNs
4	Same token, wrong-account `allowed_principals`	401, `rejected`	✅ 401 + `token rejected by auth provider`	`ArnMatcher` correctly denies non-matching ARNs

Audit log emitted (Test #2 success path)

{
  "event":"auth_verify",
  "fields":{
    "method":"POST",
    "path":"/tasks/send",
    "provider":"aws_sigv4",
    "user_id":"arn:aws:sts::412664885516:assumed-role/AWSReservedSSO_PowerUserAccess_c794d5f2c2fe4370/Naveen",
    "org_id":"412664885516",
    "token_kind":"sigv4",
    "groups_count":0,
    "remote_addr":"[::1]:62448"
  }
}

Every field correct: provider matches the configured name, user_id is the STS-returned canonical ARN (assumed-role form, including session name), org_id is the AWS account number, token_kind is the new forge-aws-v1.-prefix detection.

Two bugs caught during live testing (both fixed in `8568535`)

URL re-encoding through net/url. Round-tripping the presigned URL through Go's *url.URL.String() re-encoded query params in ways that differed from how the AWS SDK emitted them (e.g., / in X-Amz-Credential, + inside X-Amz-Security-Token). STS recomputed the canonical request from those re-encoded bytes and rejected. Fix: PresignedToken keeps a RawURL string field with the byte-for-byte original; the parsed *url.URL is used only for SSRF host validation and query-param inspection.
boto3's generate_presigned_url quirk. The reference client originally used boto3.client('sts').generate_presigned_url('get_caller_identity', …), which produces a URL STS rejects with SignatureDoesNotMatch (known boto3 quirk — signs as if POST). Fix: the reference client now uses the lower-level SigV4QueryAuth.add_auth() directly, same pattern aws-iam-authenticator uses.

Other layers also exercised live

✅ Egress allowlist auto-extended to include sts.us-east-1.amazonaws.com — Forge's outbound STS call was permitted.
✅ Loopback static_token still works for the dashboard at http://localhost:9999/ (auto-prepended via PrependChain — Phase 1 review Add per-agent secrets, build signing, and forge framework #10 invariant intact under Phase 2).
✅ Config hot-reload picks up forge.yaml content changes for most fields. Caveat: auth-chain providers are constructed once at startup; modifying allowed_principals requires a hard restart (Ctrl-C + forge run again) for the new allowlist to take effect. Documented as a follow-up — affects all providers, not just aws_sigv4.
✅ Auth-before-Egress wizard ordering (639bfa9) ensures wizard-scaffolded configs include the STS host in egress_hosts.

What was NOT live-tested (and why)

gcp_iap provider — requires a GCP project with HTTPS LB + IAP enabled. Covered by unit tests with a fake JWKS signer.
azure_ad provider — requires an Entra tenant + app registration. Covered by unit tests with a fake AAD.

Per PHASE2_TEST_STRATEGY.md §8.2, those live tests run at release-tag time, not on every PR.

What landed (12 commits)

#	Commit	Theme
PR 1	`55942d8`	Header contract extension (foundation)
PR 2	`9a1ebae`	`aws_sigv4` provider (initial — header-reflection design)
PR 3	`5b71071`	`gcp_iap` provider
PR 4	`1e23140`	`azure_ad` provider
PR 5	`47c4748`	Wizard + non-interactive flags + Web UI integration
PR 6	`98578f9`	Operator docs + CHANGELOG
fix	`639bfa9`	Auth step runs before Egress; auth hosts auto-added to allowlist
audit	`b5f303b`	Audit-pass refinements (iap_jwt token_kind, URL parse caching, cleanup)
docs	`aaf8375`	Gitignore `docs/auth` (per reviewer feedback)
client	`382294e`	Reference client + "client signing contract" docstring (original pattern)
pivot	`b3444c2`	Switch `aws_sigv4` to pre-signed URL pattern
live	`8568535`	Live-test fixes: preserve raw URL; use `SigV4QueryAuth` in client

Total: ~42 files, +5,500 / -700 lines net (mostly tests + the design-pivot rewrite).

Phase 1 compatibility

Zero forge.yaml changes required for callers using Phase 1 providers (static_token, oidc, http_verifier). Phase 1 test suite passes unmodified.
The auth Headers map gained one new key — X-Goog-Iap-Jwt-Assertion for gcp_iap. Existing keys unchanged.
The oidc package gained an internal SkipIssuerCheck field with yaml:"-" — unreachable from forge.yaml, only set by azure_ad multi-tenant. Operators see no change.

Security model highlights

No IdP secret on Forge. STS / GCP / AAD do the cryptographic work; Forge reflects or verifies, never holds keys.
SSRF guard on aws_sigv4 — the pre-signed URL host MUST match sts.<configured-region>.amazonaws.com. A token whose URL points elsewhere is rejected at parse time, before any outbound request.
Algorithm whitelist before key lookup on the OIDC / IAP / AAD providers — algorithm-confusion attacks rejected before reaching JWKS.
Tenant gate before Graph on azure_ad — multi-tenant requires explicit opt-in; Graph calls only fire after tid validation.
Caller's Bearer reflected to Graph — Forge holds no Graph credentials.

Known deferred work

TUI sub-step input flows for the three new providers — non-interactive flag path is the production-critical surface; TUI users pick "Custom" and edit forge.yaml directly until the TUI follow-up lands.
Hot-reload of auth chain — config edits to auth.providers (incl. allowed_principals) require a hard forge run restart. Affects all providers, not just Phase 2.

Test plan

Unit tests: happy path, malformed input, scope/tenant rejection, allowlist miss, cache hit/miss, stale-grace, soft-fail, alg confusion, body caps, redirect attacks — all three new providers
Pre-signed URL parser fuzz test (corpus-seeded, runs in CI)
Phase 1 regression suite passes unmodified
go test -race -count=1 ./... — 42 packages green
golangci-lint v2.10.1 — 0 issues
gofmt -l forge-core forge-cli forge-plugins — clean
Anti-pattern grep gates (no aws-sdk-go import, IAP constants confined, no JWT in azure_ad, skip_issuer_check never in YAML) — all pass
Live AWS validation — happy + deny paths against real STS (see "Real-AWS validation" above)
Manual smoke for gcp_iap and azure_ad — runs at release-tag time per PHASE2_TEST_STRATEGY.md §8.2

Design artifacts (offline)

Full design package in ~/Desktop/forge_designs_and_PRD/phase2_implementation/:

PHASE2_CLOUD_NATIVE_PROVIDERS.md — top-level design + §9 locked decisions
PHASE2_PROGRESS_MAP.md — diagram-to-PR tracker
PHASE2_TEST_STRATEGY.md — pyramid, harnesses, security catalog, CI gates, manual smoke
7 PlantUML diagrams (component / class / startup / request flow / per-provider deep-dives)
PR1_HEADER_CONTRACT.md through PR6_DOCS.md — per-PR checklists with code sketches and acceptance criteria

…r 1) HeadersFromRequest gains Authorization, X-Goog-Iap-Jwt-Assertion, X-Amz-Date, X-Amz-Security-Token so future providers consuming non-Bearer formats (aws_sigv4, gcp_iap) can read what they need without changing the Provider.Verify signature. TokenKind recognizes the "AWS4-HMAC-SHA256 " prefix and returns "sigv4", so audit logs can distinguish Sigv4 requests from "empty" even though the Bearer extractor returns "". Middleware now consults the chain even when no Bearer token was extracted, provided a non-Bearer auth header is present (Sigv4 Authorization or IAP assertion). When NO auth headers at all are present, the audit reason still resolves to ErrMissingBearer — preserving review initializ#4's stable "missing_token" reason code. Phase 1 providers see zero behavior change; their Verify path is unchanged. All Phase 1 tests pass without modification.

…e 2 pr 2) aws_sigv4 authenticates AWS-IAM callers by reflecting their Sigv4 signature to STS GetCallerIdentity. No aws-sdk-go-v2 dependency (decision §9.1): the STS RPC is ~150 LOC of hand-rolled HTTP + XML. Forge never holds the caller's secret key — STS validates the signature on Forge's behalf. Key pieces: - sigv4_parser.go: pure string parser, fuzz-tested, never panics - sts_client.go: 200/4xx/5xx classification per review initializ#6 contract - identity_cache.go: hash(AKID|YYYYMMDD)-keyed TTL cache, opportunistic eviction past 10k entries, Put does NOT extend prior expiry - arn_matcher.go: shell-style globs via path.Match (decision §9.3), invalid patterns fail at Factory time - provider.go: scope check (service=sts, region match) before any STS round trip, cache hit avoids RPC, rejection does NOT poison the cache security: - Algorithm: only AWS4-HMAC-SHA256 prefix is claimed - Scope: cross-service replay (s3->sts) and cross-region replay (eu-west-1->us-east-1) rejected at parse-time - Cache: bucketing by YYYYMMDD bounds stolen-key window to a day - Body cap: 64 KiB on STS responses - Logs: STS error bodies summarized at 200 chars, newlines stripped audit: - ErrTokenNotForMe -> not_for_me (no AWS4 prefix) - ErrInvalidToken -> invalid (malformed Sigv4) - ErrTokenRejected -> rejected (scope/allowlist/STS 4xx) - ErrProviderUnavailable -> provider_unavailable (STS 5xx/network) extras: - security.AuthDomains gains sts.<region>.amazonaws.com (+ override host when sts_endpoint set for tests) - forge-cli/runtime/runner.go side-effect imports aws_sigv4

gcp_iap consumes the X-Goog-Iap-Jwt-Assertion header that GCP's Identity-Aware Proxy forwards on every authenticated request when Forge sits behind a GCP HTTPS load balancer with IAP enabled. Decision §9.4: IAP issuer (https://cloud.google.com/iap) and JWKS URL (https://www.gstatic.com/iap/verify/public_key-jwk) are hardcoded. They're the only stable contract GCP exposes; an override knob would be a footgun. key pieces: - iap_jwks.go: ES256-only JWKS cache, TTL refresh + backoff + stale-grace (mirrors Phase 1 OIDC review initializ#1 pattern) - provider.go: header-presence check, claims projection, iss/aud gates, sub/email required-claims check - parseECJWKSet drops non-EC / non-P-256 / non-ES256-labeled keys during parse — defense in depth against compromised JWKS - alg whitelist rejects RS256 BEFORE key lookup (algorithm- confusion defense) - aud as string OR array both parse (JWT spec allows either) - audit reasons follow Phase 1 contract: rejected — iss/aud mismatch, expired, bad signature invalid — alg != ES256, missing sub/email, bad kid provider_unavailable — JWKS fetch failed AND no prior key cached not_for_me — header absent extras: - security.AuthDomains returns www.gstatic.com when gcp_iap is configured - forge-cli/runtime/runner.go side-effect imports gcp_iap

azure_ad authenticates Microsoft Entra ID tokens. Composes the Phase 1 oidc.Provider (decision §9.2) for signature verify + base claim validation; layers AAD-specific concerns on top: - Tenant lock-in via the tid claim - Optional Microsoft Graph group enrichment when JWT groups claim is empty (AAD truncates at ~200 groups) - Single-tenant vs multi-tenant issuer template key pieces: - provider.go: composed oidc + tenant gate + Source overwrite to "azure_ad" (replaces the inner "oidc" stamp) - tenant.go: ExtractTenantID — typed accessor for the tid claim - graph_client.go: Graph /me/transitiveMemberOf with pagination, same-host enforcement (rejects redirect attacks), 401/403 -> ErrTokenRejected, 5xx -> ErrProviderUnavailable, defensive cap at 5000 groups, body cap 1 MiB per page - graph_cache.go: 5 min TTL, same shape as aws_sigv4's cache key decisions: - oidc.Config gains internal SkipIssuerCheck flag with yaml:"-" so it CANNOT be set via forge.yaml — only callable from another Go package. AAD multi-tenant uses it; everything else leaves it off. Surfacing it in YAML would let operators disable iss validation by accident. - Soft-fail on Graph 5xx/401: Identity returned with empty Groups rather than blocking prod traffic. Hard-fail mode (graph_required) out of scope for v0.11. - Forge reflects the CALLER's Bearer to Graph; holds no Graph credentials of its own. audit reasons: - ErrTokenRejected -> rejected (tid mismatch, bad sig, Graph 401) - ErrInvalidToken -> invalid (missing tid, malformed claims) - ErrProviderUnavailable -> provider_unavailable (Graph 5xx, JWKS down) extras: - security.AuthDomains returns login.microsoftonline.com always; graph.microsoft.com when groups_mode=graph - forge-cli/runtime/runner.go side-effect imports azure_ad

…(phase 2 pr 5) Wires aws_sigv4, gcp_iap, and azure_ad into the operator surfaces: cli (forge-cli/cmd/init*.go): - New non-interactive flags namespaced --auth-aws-* / --auth-gcp-iap-* / --auth-azure-* (StringSlice for repeatable allowed-principal globs) - buildAuthFromFlags validates required combinations and emits the right egress hosts per provider (sts.<region>.amazonaws.com, www.gstatic.com, login.microsoftonline.com + graph.microsoft.com when groups_mode=graph) - authEgressHostsFromSettings mirrors the same logic for the Web UI - renderAuthBlock supports []string lists with proper YAML quoting (allowed_principals) web ui (forge-ui/handlers_create.go): - AuthProviderTypeMeta lists the three new types with helpful labels validate (forge-core/validate/auth.go): - knownAuthProviderTypes admits aws_sigv4 / gcp_iap / azure_ad - validateProviderSettings enforces per-type required keys (aws_sigv4.region, gcp_iap.audience, azure_ad.audience + tenant_id-unless-multi-tenant, azure_ad.groups_mode whitelist) tests: - 11 new renderer + flag-parsing tests - Round-trip YAML parse used instead of brittle quote-pattern asserts - Updated wizard-meta test to expect 7 auth provider types deliberate scope cut: - TUI step_auth.go sub-step input flows for the 3 new providers are NOT included. Adding them is mechanical (~100 LOC per provider, mirroring the OIDC issuer→audience→groups_claim phase chain) but out of scope for v0.11.0 cut. Non-interactive flag path covers the production-critical CI/CD case; operators using the TUI can pick "Custom" and edit forge.yaml directly until the follow-up lands.

…pr 6) Adds the operator-facing documentation for the three Phase 2 providers that shipped in PRs 1–5, plus a top-level auth index, chain-semantics concepts page, CHANGELOG, and a README link. new docs: - docs/auth/index.md — provider matrix and chain-semantics overview - docs/auth/concepts/chain.md — first-match-wins, no-fall-through on reject, non-Bearer header support, mixed-chain worked example - docs/auth/providers/aws_sigv4.md — STS reflection setup, awscurl example, assumed-role-vs-IAM-role gotcha called out twice - docs/auth/providers/gcp_iap.md — backend service ID lookup steps, hardcoded JWKS rationale, GCP IAM Conditions for allowlisting - docs/auth/providers/azure_ad.md — app registration walkthrough, single/multi/graph mode configs, multi-tenant warning prominent every provider doc includes: - Prerequisites checklist - forge.yaml example - Configuration reference table - Audit log shape (literal JSON) - Troubleshooting matrix (grep-able reason codes) - Security model + limitations sections CHANGELOG.md (new file): - Lists Added / Changed entries for v0.11.0 - "Notes for upgraders" makes the non-breaking nature explicit - Calls out the known TUI sub-flow gap from PR 5 README.md: - Adds Auth Providers row to the Security documentation table

…owlist The wizard was asking for Egress confirmation before the operator had picked an auth provider, so STS / AAD authority / IAP JWKS hosts never appeared in the egress list. Forge would scaffold a forge.yaml whose egress_hosts blocked its own auth-provider RPC calls — failure happens later at `forge run`, with no signal the wizard could have caught. changes: - Swap step order in init.go: Auth now runs immediately before Egress - Extend DeriveEgressFunc with (authMode, authSettings) so the Egress step's Prepare(ctx) pulls the operator's auth choice from WizardContext and forwards it into deriveEgressDomains - deriveEgressDomains calls authEgressHostsFromSettings (same helper the non-interactive --auth=… path uses) — TUI and CLI now produce identical egress lists for any given auth choice - EgressStep's inferSource() learns to label auth-derived hosts: sts.<region>.amazonaws.com → "aws_sigv4 auth" www.gstatic.com → "gcp_iap auth" login.microsoftonline.com → "azure_ad auth" graph.microsoft.com → "azure_ad auth (graph)" <oidc issuer host> → "oidc auth" <http_verifier url host> → "http_verifier auth" tests: - TestDeriveEgressDomains_AuthProviderHostsMerged: 8 cases pinning the per-provider host emission (incl. graph-mode adds graph host) - TestDeriveEgressDomains_AuthHostsMergeNotOverwrite: auth pass is additive — provider / channel hosts still emit alongside auth hosts docs: - docs/auth/concepts/chain.md gains a "TUI wizard ordering" section explaining the Auth-before-Egress invariant

…, cleanup Final-pass audit findings against the phase 2 design doc surfaced one correctness bug and several small improvements. All gates clean (go test -race / golangci-lint / gofmt). 42 packages pass. BUG fix — middleware emits token_kind="iap_jwt" for IAP requests: The strategy doc §5/§10 lists five token_kind values: empty, opaque, jwt, sigv4, iap_jwt. PR1 wired sigv4 detection but missed iap_jwt, so successful GCP IAP requests audited with token_kind="empty" — the same value as no-auth requests, defeating the audit-pipeline goal of counting IAP traffic distinctly. Middleware now classifies X-Goog-Iap-Jwt-Assertion presence as kind="iap_jwt" on the empty-Bearer path. New regression test pins it. Improvement — graph_client.go avoids per-page URL re-parse: ensureGraphHost was parsing GraphClient.endpoint via url.Parse on EVERY pagination step. Pre-parse the endpoint Host once at construction and compare against that string instead. Trims redundant work on multi-page Graph responses. Improvement — gcp_iap classifyJWTErr ordering hardened: Replaced the bare substring match on "kid" (which would catch unrelated errors) with the specific patterns: "kid " (e.g. "kid X not found") and "not found" (covers JWKS-resolution failures). Pre-existing ordering invariant comment is now actually defended. Cleanup — drop redundant single-function file: Moved ExtractTenantID from azure_ad/tenant.go into provider.go alongside other claim accessors and removed the empty tenant.go. The function was a 1-liner and didn't justify its own file. Cleanup — inline audienceContains shim: Replaced the audienceContains() wrapper (one-liner around slices.Contains) with a direct call at the use site. Less indirection, same behavior. Cleanup — middleware: simplify hasNonBearerAuth boolean expr: Folded the multi-line if-chain into a single boolean expression. Same semantics, less noise. audit findings deferred as nits, not fixed: - aws_sigv4 Parser as zero-value struct (cosmetic; keeps symmetry) - egress_step.go hostOf manual URL parsing (cosmetic; non-hot path) - 10k eviction comment wording audit findings confirmed not bugs: - GraphCache TTL test (already exists in graph_cache_test.go) - PrependChain loopback invariant intact (runner.go line 2036)

The Phase 2 provider docs were committed as MD files under docs/auth/ but we don't want to version-control them — the source-of-truth lives in the design folder, and we'll deliver via the doc site separately. - .gitignore: add docs/auth/ - git rm --cached docs/auth/** (local files preserved) - README.md: drop the now-broken "Auth Providers" docs row - CHANGELOG.md: drop the docs/auth/*.md links from the v0.11.0 entry No code or test changes.

…ontract Real-AWS testing surfaced a documentation gap: callers cannot use raw `awscurl` / `aws-sdk-go` against Forge's `aws_sigv4` provider because Sigv4 binds the signature to the destination host. Standard tools sign for the URL they're addressing (Forge) — STS then rejects the reflected signature because the host bytes don't match. The server-side code is correct. The client just needs to sign a hypothetical STS request, then attach the resulting headers to its real POST to Forge. Same pattern as aws-iam-authenticator for EKS. This commit: - Ships `scripts/forge-aws-sign.py`, a ~100 LOC reference client using boto3.session + SigV4Auth. CLI flags for --region, --url, --profile, --body, --verbose. Reads SSO/IRSA/profile/env credentials via boto3's standard chain. - Extends the package-level docstring in `forge-core/auth/providers/aws_sigv4/provider.go` with a "Client-side signing contract" section spelling out the 4-step pattern and pointing readers to the reference script. - Adds a "Client-side requirement" section to CHANGELOG.md so adopters know to grab the helper or write their own before integrating. Validated against real AWS: - STS reflection: 200, identity stamped, correct ARN/Account/UserID - ARN allowlist match: 200 (matching pattern) - ARN allowlist miss: 401 reason=rejected (correct authz gate) - No-auth: 401 reason=missing_token (Phase 1 contract preserved)

…uthenticator) Phase 2 PR 2's original "reflect Sigv4 headers" design was broken in the obvious way: Sigv4 binds its signature to the destination host as part of the canonicalized signing input. Headers signed for Forge's host could not be replayed against STS — STS sees host:sts.<region>. amazonaws.com, recomputes the signature, gets a different hash, rejects with "SignatureDoesNotMatch". Caught during real-AWS smoke; documented in PR initializ#79 description. This commit replaces the pattern with the same approach aws-iam-authenticator uses for EKS: Client (3 lines): url = boto3.client('sts').generate_presigned_url('get_caller_identity', ExpiresIn=900) token = 'forge-aws-v1.' + base64.urlsafe_b64encode(url.encode()).rstrip(b'=').decode() requests.post(forge_url, headers={'Authorization': f'Bearer {token}'}, ...) Server: Authorization: Bearer forge-aws-v1.<base64-of-presigned-sts-url> → decode + validate host (SSRF guard) + GET on the URL → STS → identity Net effect on caller experience: identical to JWT/OIDC/azure_ad — "mint token, send Bearer, done." Three lines of client code, hidden in ~15 lines of any AWS SDK in any language. what changed: forge-core/auth/providers/aws_sigv4/ sigv4_parser.go — was parsing AWS4-HMAC-SHA256 Authorization header now parses forge-aws-v1.<base64-url> Bearer tokens (URL host validation, SSRF guard, X-Amz-Credential parsing for cache key derivation) sts_client.go — was POST with reflected headers now GET on the pre-signed URL; same 200/4xx/5xx classification and 64 KiB body cap provider.go — Verify() now reads the Bearer token (not raw headers); SSRF guard via expectedHost field; same cache + ARN allowlist semantics forge-core/auth/ provider.go — HeadersFromRequest reverts X-Amz-Date and X-Amz-Security-Token (no longer needed); keeps X-Goog-Iap-Jwt-Assertion for gcp_iap provider.go — TokenKind detects "forge-aws-v1." prefix → "sigv4" (was: "AWS4-HMAC-SHA256 " on raw Authorization) middleware.go — simplify: empty-Bearer fallback only handles IAP (aws_sigv4 rides standard Bearer flow now) scripts/forge-aws-sign.py — rewrite as a clean reference client. --token-only: print just the token for use with curl/other tools Otherwise: do the round-trip POST and print the response CHANGELOG.md — replace "client wrapper required" friction note with the 3-line happy path snippet what stays unchanged: - forge.yaml shape (still type: aws_sigv4, region:, allowed_principals:) - identity_cache.go, arn_matcher.go (cache and authz logic untouched) - security.AuthDomains (sts.<region>.amazonaws.com derivation) - forge-cli/cmd/init* flag set and renderer - validate.ValidateAuthConfig (region still required) - forge-ui/handlers_create.go (AuthProviderTypeMeta entry) Tests: 42 packages pass, golangci-lint v2.10.1 clean, gofmt clean, no aws-sdk-go imports (decision §9.1 still holds). Net diff: +732 / -625 lines (mostly test rewrites; ~80 LOC net less in the provider package because the new flow is structurally simpler).

…n client Two correctness fixes surfaced by live AWS testing of the pre-signed URL pattern from b3444c2. 1. Preserve the raw URL byte-for-byte. Round-tripping the presigned URL through Go's net/url package re-encoded query params in subtle ways (e.g. "/" in X-Amz-Credential, "+" inside X-Amz-Security-Token) that didn't match how the AWS SDK emitted them on the caller side. STS recomputes the canonical request using whatever bytes we send and gets a different hash → 4xx SignatureDoesNotMatch → audit reason "rejected". - PresignedToken gains a RawURL field — the exact bytes from the decoded token payload. - The parsed *url.URL is kept ONLY for SSRF host validation and query-param inspection. It is NEVER used to construct the outbound request. - Provider.Verify now passes parsed.RawURL to STSClient.GetCallerIdentity. 2. Use SigV4QueryAuth directly in the reference client (not boto3's high-level generate_presigned_url). boto3.client('sts').generate_presigned_url('get_caller_identity', ...) produces a URL STS rejects with SignatureDoesNotMatch when GET. Known quirk — the high-level presigner signs as if the request were a POST. aws-iam-authenticator works around this by signing the AWSRequest explicitly; scripts/forge-aws-sign.py now does the same: req = AWSRequest(method='GET', url='https://sts.{region}.amazonaws.com/?Action=GetCallerIdentity&Version=2011-06-15') SigV4QueryAuth(creds, 'sts', region, expires=900).add_auth(req) token = 'forge-aws-v1.' + base64.urlsafe_b64encode(req.url.encode()).rstrip(b'=').decode() Live validation against real AWS (account 412664885516, SSO assumed-role): - Happy path: HTTP 400 body-shape error + auth_verify with correct ARN - Deny path: HTTP 401 + auth_fail reason="rejected" + token_kind="sigv4" 42 packages still pass; golangci-lint clean; gofmt clean. (Known follow-up surfaced but out of scope: hot-reload of forge.yaml doesn't rebuild the auth chain, so allowlist changes require a hard restart. Same caveat affects all providers, not just aws_sigv4.)

Two paired additions that make Phase 2 provider onboarding actually usable end-to-end via the TUI. ## allowed_accounts (ergonomic shortcut) aws_sigv4.Config gains AllowedAccounts []string. Each 12-digit AWS account ID expands at Factory time into the canonical glob set covering every STS identity shape: arn:aws:iam::<acct>:user/* — direct IAM users arn:aws:iam::<acct>:role/* — direct IAM roles arn:aws:sts::<acct>:assumed-role/*/* — SSO, AssumeRole, IRSA arn:aws:sts::<acct>:federated-user/* — SAML / web-identity federation So an operator who wants "anyone in this account" writes one line of config instead of four globs. Composes with allowed_principals — list specific roles AND whole accounts in the same provider entry. Validation: - validateAccountID checks the 12-digit shape at Factory time - validate.ValidateAuthConfig catches malformed entries at `forge validate` time (before scaffold writes forge.yaml) ## TUI sub-flows for all three Phase 2 providers forge init's TUI picker now has 7 entries (was 4): None / OIDC / HTTP Verifier / AWS Sigv4 / GCP IAP / Azure AD / Custom Input flows: aws_sigv4 region → audience (opt) → accounts (opt) → done gcp_iap audience → done azure_ad tenant → audience → done (single-tenant only) Azure AD intentionally restricts the TUI to single-tenant. Enabling allow_multi_tenant is a deliberate security trade-off (any Entra tenant in the world is admissible) and should require editing forge.yaml, not clicking through a wizard. Egress hosts auto-computed from the selection: aws_sigv4 → sts.<region>.amazonaws.com gcp_iap → www.gstatic.com (hardcoded §9.4) azure_ad → login.microsoftonline.com So the Egress review step (which runs after Auth per 639bfa9) shows the operator the full outbound surface they're about to allow, including the auth-provider's STS / IAP / AAD endpoints. ## Other surfaces forge init --auth-aws-allowed-account flag (repeatable) — matches the non-interactive path the wizard takes internally. CHANGELOG: drops the "TUI deferred" note (no longer deferred), adds sections for allowed_accounts and the TUI sub-flows. ## Tests arn_matcher_test.go +3 tests (validateAccountID, expansion shapes, expanded patterns match realistic ARNs) provider_test.go +4 tests (AllowedAccounts happy path, deny path, factory rejects malformed, mix with AllowedPrincipals) step_auth_test.go +6 tests (AWS full flow, AWS optional skips, AWS bad account rejection, GCP flow, AAD flow, Phase 2 summaries) validate/auth_test.go updated validator covers allowed_accounts 42 packages pass go test -race -count=1; golangci-lint v2.10.1 clean; gofmt clean.

Live wizard test surfaced a YAML emission bug: AWS account IDs like "412664885516" were rendered unquoted, e.g. allowed_accounts: - 412664885516 yaml.v3 decodes that as !!int → cannot unmarshal into []string in the aws_sigv4 provider's Config.AllowedAccounts field → provider construction fails at startup. needsYAMLQuoting now treats any all-digit string as requiring quotes. The renderer emits the correct form: allowed_accounts: - "412664885516" Fix is general — applies to anything in the auth-settings schema that's a digit string (ZIP-shaped IDs, version segments, etc.) so this same bug class can't surface for a new provider later. Live-validated after the fix: - auth_verify on the matching allowed_accounts (happy) - auth_fail reason=rejected on a wrong account (deny)

initializ-mk · 2026-05-24T18:56:47Z

Phase 2 review — three providers, focused on the outbound-HTTP defense-in-depth gaps

I ran a security-focused audit across the four surfaces: aws_sigv4, gcp_iap, azure_ad, and the middleware/CLI/UI wiring. Tests pass with -race across all 9 affected packages, golangci-lint clean across all three modules, gofmt clean. The crypto choices, sentinel-error contract, loopback prepend invariant, and SkipIssuerCheck YAML-unreachability all hold up.

The pattern in the BLOCKERs below is the same in three places: the outbound *http.Client is constructed with &http.Client{Timeout: …} and nothing else. That leaves Go's default redirect policy (follow up to 10), which silently undermines the SSRF / same-host guard that each provider does at the application layer. The repo already has security.SafeRedirectPolicy(10) and uses it in the webhook/http_request paths — wire it (or http.ErrUseLastResponse) into these three clients.

BLOCKERS (please fix before merge)

B1. azure_ad/graph_client.go:138 — ensureGraphHost accepts http:// nextLink
ensureGraphHost only compares got.Host; the scheme is not checked. A @odata.nextLink: \"http://graph.microsoft.com/...\" passes the gate. The next iteration sets Authorization: <caller's bearer> and sends it in plaintext. Go strips Authorization on cross-host redirects but NOT on cross-scheme downgrades to the same host. Fix: also require got.Scheme == \"https\" (or == c.endpointScheme so test mode still works).

B2. azure_ad/graph_client.go:51 — Graph client follows arbitrary HTTP redirects
&http.Client{Timeout: timeout} uses the default CheckRedirect. The application-layer ensureGraphHost only runs on @odata.nextLink (line 73), not on HTTP 301/302/307. A 302 from Graph (or a MITM with a cert that ever cracks) would silently follow, parse an attacker body, and yield to JSON unmarshal. Fix: set CheckRedirect: func(*http.Request, []*http.Request) error { return http.ErrUseLastResponse } — Graph's nextLink mechanism makes HTTP redirects unnecessary.

B3. aws_sigv4/sts_client.go:33 — STS client follows arbitrary HTTP redirects
Same root cause. The parser-side host validation (sigv4_parser.go:88) only covers the first hop; once http.Client.Do follows a redirect off sts.<region>.amazonaws.com, an attacker-controlled body becomes the parsed STS XML and controls Identity.UserID / OrgID / Arn. The token-host gate is exactly the SSRF guard the design relies on; redirect-following silently invalidates it. Fix: pin CheckRedirect to ErrUseLastResponse on this client.

MAJOR (should land in this PR or a fast follow-up)

M1. aws_sigv4/sigv4_parser.go:88 — no rejection of URL userinfo
https://user:pass@sts.us-east-1.amazonaws.com/?... passes the EqualFold(u.Host, expectedHost) check (u.Host excludes userinfo per RFC 3986). Forge then ships Basic-auth bytes from an attacker-controlled token to STS. STS ignores it, but it's still attacker bytes leaving the box. Add if u.User != nil { return error }.

M2. aws_sigv4/sigv4_parser.go — no X-Amz-Date freshness or X-Amz-Expires sanity
STS enforces a ~15min window, but the cache key is hash(AKID|YYYYMMDD), so a stolen-token replay window relies entirely on STS's own enforcement. Reject X-Amz-Expires > 900 and an X-Amz-Date older than ~15min parser-side as belt-and-braces.

M3. azure_ad/provider.go:153 — multi-tenant has no allowed_tenants allow-list
When allow_multi_tenant: true, both the iss check (via SkipIssuerCheck) AND the tid check are suppressed — any Entra tenant in the world verifies. This is the documented trade-off, but the spec at resolveIssuer line 181 says "azure_ad enforces tenant via the tid claim instead," which is misleading: in multi-tenant mode there's no tid enforcement at all. Either add an allowed_tenants: []string knob, or fix the doc to be loud about "any tenant globally."

M4. forge-core/auth/middleware.go:110 — token_kind=\"iap_jwt\" only recorded when Bearer is missing
The kind := TokenKind(token) runs first; the IAP upgrade only triggers when kind == \"empty\". A request with a real Bearer JWT plus an IAP assertion records kind=\"jwt\", masking IAP-fronted traffic in audit dashboards. Audit-pipeline goal of counting IAP traffic distinctly is defeated. Fix: upgrade to iap_jwt whenever the IAP header is present AND the chain verifies via the gcp_iap provider.

M5. forge-core/validate/auth.go:64 — validateProviderSettings permissive on unknown keys
A typo like aud: instead of audience: slips through silently — asString(settings, \"audience\") returns "" and the required-keys check misses it. Combined with handlers_create.go forwarding Settings unfiltered into the on-disk scaffold, the Web UI POST {\"settings\": {\"audience\": \"x\", \"evil_key\": \"y\"}} produces a forge.yaml with evil_key in it. Today every field has yaml:\"-\" discipline so this is harmless — but it's one missed tag away from being exploitable. Add a closed-key whitelist + warn on unknown.

NITs / follow-ups (don't block merge)

gcp_iap/iap_jwks.go:285 — classifyJWTErr substring matching is brittle. Prefer errors.Is(err, jwt.ErrTokenExpired) etc. via golang-jwt v5's named sentinels.
gcp_iap/iap_jwks.go:213 — j.keys = keys unconditionally replaces during refresh. GCP rotates with overlap so today this is fine, but a partial-but-valid JWKS body (>=1 key but missing an in-flight kid) would drop kids the stale-grace contract assumes are kept. Consider merge-on-success or at least logging the diff.
azure_ad/graph_cache.go:33 — Get returns the cached slice without copy; any caller that mutates id.Groups mutates the cache.
forge-cli/cmd/init_auth.go:338 — needsYAMLQuoting catches all-digit (the 3364ca8 fix) but misses scientific-notation (1e10), hex (0x10), leading-zero (010), and .inf/.nan. Auth-setting values rarely look like these, but the docstring says "false negatives are bugs."
aws_sigv4/identity_cache_test.go:69 — string(rune(i)) for surrogate code points yields \\ufffd and collides keys. Use strconv.Itoa(i) so the threshold test actually reaches 10,001.
Several files: req, _ := http.NewRequestWithContext(...) discards the error. Hardcoded URLs make the panic risk currently zero, but it's an antipattern in security-critical code.
gcp_iap/provider_test.go — add an HS256-with-EC-public-key-as-secret test case (most dangerous alg-confusion shape, currently only RS256 covered).

Verdict

Needs revision before merge — three BLOCKERs (B1–B3) are each 1–3 line CheckRedirect/scheme-check fixes plus tests. None require architectural change; all three undermine the application-layer SSRF / same-host guards that the design correctly identifies as the security boundary. The MAJORs (M1–M5) are higher signal-to-effort fixes and should land alongside.

What's not a concern:

✅ SkipIssuerCheck is genuinely unreachable from YAML (grep + behavior verified).
✅ Headers.Get is case-insensitive (forge-core/auth/provider.go:77); the lowercase-IAP-header concern is cleared.
✅ Loopback static_token PrependChain still wraps Phase 2 providers; localhost dashboard works.
✅ Phase 1 backward compat — zero forge.yaml changes required, Phase 1 tests pass unmodified.
✅ STS XML root-element enforcement (Go's encoding/xml honors the XMLName xml.Name tag).

Live AWS validation in the PR description is excellent — that level of evidence for the happy and deny paths is exactly the right bar. Once the three redirect fixes land, this is ready.

initializ-mk · 2026-05-24T19:06:27Z

@naveen-kurra also run /sync-docs command/plugin for Claude to sync the document updates based on the changes

Phase 2 review (PR initializ#79) flagged three same-root-cause issues across the outbound HTTP clients in azure_ad, aws_sigv4, and gcp_iap: - B1: ensureGraphHost accepted http:// nextLinks — caller's Bearer would leak in plaintext (Go strips Authorization on cross-host redirect but NOT on https→http same-host downgrade). - B2: Graph client used Go's default redirect policy (follow up to 10). ensureGraphHost only validates @odata.nextLink, not HTTP 301/302/307. A redirect from Graph → attacker URL would let the attacker control the JSON body that becomes Identity.Groups (latent until Phase 4 authz lands; closing at the boundary now). - B3: aws_sigv4's STS client had the same default-redirect issue. The parser-side same-host gate covers only the first hop; a redirect off sts.<region>.amazonaws.com would let attacker bytes become the parsed STS XML and control Identity.UserID/OrgID/Arn. Verified all three are real (Go default CheckRedirect follows up to 10 per net/http docs). Smoking gun confirmed in five outbound clients across forge-core/auth/providers/: aws_sigv4/sts_client.go ← fixed (B3) gcp_iap/iap_jwks.go ← fixed (not in review; same pattern, same blast radius — attacker JWKS means forged tokens verify) azure_ad/graph_client.go ← fixed (B1+B2) oidc/provider.go ← Phase 1 code, NOT touched here; OIDC discovery sometimes legitimately redirects, needs design discussion. Filed as follow-up on issue initializ#80. httpverifier/provider.go ← Phase 1 code, NOT touched here; operator-configured URL may be behind LB. Follow-up on issue initializ#80. Fix shape (uniform across the three Phase 2 providers): http.Client{ Timeout: timeout, CheckRedirect: func(*http.Request, []*http.Request) error { return http.ErrUseLastResponse }, } These three endpoints (STS GetCallerIdentity, IAP JWKS, Graph transitiveMemberOf) NEVER legitimately issue 3xx — the auth contract each provider validates is bound to the configured host. Returning ErrUseLastResponse causes the 3xx response to flow into our existing status-code switch, where the "unexpected status" default arm maps it to ErrProviderUnavailable (and the audit logs "provider_unavailable", not a misleading "rejected"). B1 specific (separate from the redirect-policy change): ensureGraphHost now checks both Host AND Scheme. GraphClient stores endpointScheme alongside endpointHost, pre-parsed at construction so the per-page check stays cheap. Test mode (httptest's http://) still works because the configured scheme is matched against, not hard- coded https. Regression tests added (one per provider): TestSTSClient_DoesNotFollowRedirects TestJWKSCache_DoesNotFollowRedirects TestGraphClient_DoesNotFollowRedirects Each spins up an httptest server that returns 302 → attacker URL, runs the client, asserts: - error returned (not silently followed) - error is ErrProviderUnavailable (correct audit reason) - endpoint hit exactly once (counts redirect-follows) Plus two scheme-specific cases for B1: TestEnsureGraphHost_RejectsSchemeDowngrade — https→http same-host TestEnsureGraphHost_TestModeHTTPOK — httptest http stays OK 42 packages pass go test -race -count=1; golangci-lint v2.10.1 clean; gofmt clean.

Phase 2 review M1: a token whose pre-signed URL contained userinfo (https://user:pass@sts.us-east-1.amazonaws.com/...) would pass our EqualFold(u.Host, expectedHost) check because RFC 3986 separates userinfo from host — net/url puts "user:pass" into u.User and "sts.us-east-1.amazonaws.com" into u.Host independently. http.Client.Do then synthesizes Authorization: Basic <b64(user:pass)> from u.User and ships those attacker-controlled bytes to STS. STS ignores Basic (it validates the X-Amz-Signature in the query string), so this isn't an active auth bypass — but attacker bytes still leave our box, with potential for exfiltration via STS access logs / timing. Fix: reject u.User != nil at parse time, BEFORE the host check, in sigv4_parser.go's ParseToken. One line; defense-in-depth. Regression test TestParseToken_RejectsUserinfo confirms the userinfo case is now caught and that the error message mentions "userinfo" (so future maintainers know what tripped). 42 packages still pass; lint + gofmt clean.

Phase 2 review M2: STS enforces ~15min on X-Amz-Date + X-Amz-Expires server-side, but our IdentityCache (60s default TTL) would happily serve a cached Identity even if STS somehow accepted a stale token. The cache key being hash(AKID|YYYYMMDD) gave the impression of day-long replay potential — it doesn't, because the 60s TTL bounds it — but the parser had zero own-source freshness check, so the stolen-token replay window depended entirely on STS's enforcement. Adding parser-side freshness as defense-in-depth: PresignedToken gains two fields: SigTime time.Time ← parsed from X-Amz-Date Expires time.Duration ← parsed from X-Amz-Expires ParseToken now requires both query params (rejects missing / malformed / non-positive Expires). Parsing stays pure: no clock comparison happens inside ParseToken. PresignedToken.CheckFreshness(now, maxExpires, skew) gates: - X-Amz-Expires > maxExpires → cap (default 15min — matches what all standard AWS SDKs emit for GetCallerIdentity) - now > SigTime + Expires + skew → token expired - SigTime > now + skew → token from the future skew default 5min for normal clock drift. Config additions (both default-able, no operator action needed): MaxTokenExpires time.Duration `yaml:"max_token_expires,omitempty"` ClockSkew time.Duration `yaml:"clock_skew,omitempty"` Provider gains a `now func() time.Time` field, default time.Now, overridable in tests so freshness can be exercised without clock monkey-patching. Provider.Verify calls CheckFreshness right after region check, before the cache lookup — STS round-trip and identity cache hit are both short-circuited when freshness fails. ErrTokenRejected, audit reason "rejected". Tests: parser-level (sigv4_parser_test.go): TestParseToken_PopulatesSigTimeAndExpires TestParseToken_RejectsMissingAmzDate TestParseToken_RejectsMalformedAmzDate TestParseToken_RejectsMissingAmzExpires TestParseToken_RejectsNonNumericExpires TestParseToken_RejectsNonPositiveExpires TestCheckFreshness_Expired TestCheckFreshness_FromTheFuture TestCheckFreshness_ExceedsExpiresCap TestCheckFreshness_HappyPathInsideSkew provider-level (provider_test.go): TestProvider_RejectsExpiredToken TestProvider_RejectsTokenFromFuture TestProvider_RejectsOverlyLongExpiresClaim TestProvider_AcceptsTokenAtEdgeOfSkewWindow Existing test fixtures updated: tokenFor() now stamps X-Amz-Date built from fixedTestTime so freshness passes by default; newTestProvider pins Provider.now to that same instant. Day-bucket rollover test advances the clock by 24h to match its day-2 token. 42 packages pass, lint + gofmt clean.

Phase 2 review M4: structural token_kind detection runs BEFORE the chain, so a request with both a Bearer JWT and an X-Goog-Iap-Jwt- Assertion records kind="jwt" — masking IAP-fronted traffic in audit dashboards even when gcp_iap was the actual verifier. The structural rule answers "what bytes were on the wire?" The audit signal we want answers "which auth path verified?" Those diverge only when a Bearer is present alongside a non-Bearer provider's payload — today, gcp_iap is the only such provider. Fix: identity, err := chain.Verify(...) if err == nil { kind = refineTokenKind(kind, identity.Source) } refineTokenKind: - Source == "gcp_iap" → return "iap_jwt" - otherwise → return structural kind unchanged Other providers (oidc, azure_ad, aws_sigv4, http_verifier, static_token) don't need refinement: their structural kind already matches the auth path that verifies them. Failure paths still record the structural kind (no chain identity to read Source from), so the existing "missing_token" / "not_for_me" audit reason-code contract is unaffected. Tests: TestMiddleware_TokenKind_RefinedToIapJwtWhenGCPIAPVerifies — Bearer JWT + IAP header both present; chain stubbed to return Source="gcp_iap"; assert kind upgrades to iap_jwt TestMiddleware_TokenKind_JWT_NotRefinedForOIDCProviders — counter-test: Source="oidc" must NOT trigger the refinement

…nt doc (M3) Phase 2 review M3: 1. When allow_multi_tenant: true, both the iss check (via composed oidc.Provider's SkipIssuerCheck) AND the tid check were suppressed — any Entra tenant in the world verified. 2. resolveIssuer()'s docstring said "azure_ad enforces tenant via the tid claim instead" which was misleading: in multi-tenant mode there was no tid enforcement at all. Fix is two-part: add the missing knob, fix the doc. allowed_tenants (Config.AllowedTenants []string) Optional allowlist of Entra tenant GUIDs matched against the JWT `tid` claim. Only meaningful with allow_multi_tenant=true. Three operational modes: single-tenant (default) tid MUST equal TenantID multi-tenant + AllowedTenants set tid MUST be in list multi-tenant + AllowedTenants empty no tid check ("any tenant globally" — documented high-risk shape, opted into by deliberately omitting) Match is case-insensitive (Entra emits lowercase GUIDs; operators often paste uppercase from the portal). Factory-time validation rejects allowed_tenants in single-tenant mode (TenantID is THE gate there; the combination would silently degrade if not caught). Validator + non-interactive flag + tests - forge-core/validate/auth.go: rejects allowed_tenants in single- tenant mode at validate-time; WARNS when multi-tenant+empty so the "any tenant globally" trade-off is loud - forge-cli/cmd/init.go: --auth-azure-allowed-tenant flag (repeatable, mirrors --auth-aws-allowed-account) - forge-cli/cmd/init_auth.go: buildAuthFromFlags forwards through - forge-cli/cmd/init_auth_test.go: mock adds the new flag Five new tests (provider_test.go): AllowedTenants_AcceptsListed AllowedTenants_RejectsUnlisted AllowedTenants_CaseInsensitive AllowedTenants_MissingTidRejected SingleTenant_WithAllowedTenants_RejectedAtFactory resolveIssuer() docstring rewrite Now spells out the three operational modes explicitly, including the "any-tenant" shape with its security implication. No more "azure_ad enforces tenant via the tid claim" line that's only half-true. CHANGELOG gains an "allowed_tenants" section with the canonical recipe. 42 packages pass, lint + gofmt clean.

Phase 2 review M5: validateProviderSettings was permissive on unknown keys. A typo like `aud:` instead of `audience:` would slip past the required-key check silently (asString returns ""), and — more critically — the forge-ui handler forwarded a.Settings unfiltered to the on-disk scaffold. A POST like {"settings": {"audience": "x", "evil_key": "y"}} would write evil_key into forge.yaml verbatim. Today's provider Config structs ignore unknown YAML fields via yaml.v3's default, so it's harmless — but one missed `yaml:"-"` tag on a future field would suddenly make it reachable from untrusted POST input. Closed at the boundary now. Two-part fix: 1. forge-core/validate/auth.go - New exported KnownAuthProviderSettings map[string]map[string]bool (closed whitelist per provider type — mirrors the yaml: tags on each provider's Config struct, kept in sync by convention). Internal yaml:"-" fields are intentionally absent. - validateProviderSettings emits a Warning per unknown settings key during `forge validate`. Loose-not-strict because some operators stash custom annotations from pre-Phase-2 days; warning surfaces the typo without breaking those configs. - New exported FilterKnownSettings(providerType, settings) returns a copy with unknown keys dropped. Defense-in-depth filter for use at trust boundaries. 2. forge-ui/handlers_create.go - validateAuthPayload calls FilterKnownSettings on the incoming a.Settings BEFORE validation OR scaffolding. Closes the specific exploit chain (Web UI POST → forge.yaml) regardless of yaml tag discipline drift on Config structs. Tests: validate package: TestValidateAuthConfig_WarnsOnUnknownSettingsKey TestValidateAuthConfig_NoWarningForKnownKeys TestFilterKnownSettings_DropsUnknownKeys TestFilterKnownSettings_UnknownProviderTypePassthrough TestFilterKnownSettings_AllPhase2Providers (subtests per provider/key) forge-ui: TestHandleCreateAgent_FiltersUnknownAuthSettings — end-to-end POST with evil_key in settings; assert it does NOT survive into the scaffold's captured AuthCreateOptions.Settings 42 packages still pass; lint + gofmt clean.

Batch-clearing the "don't block merge" follow-ups from review of PR initializ#79. initializ#1 gcp_iap classifyJWTErr — use jwt v5 sentinels via errors.Is rather than substring matching (library wording shifts across patches; sentinels are public API). Special-case ErrTokenSignatureInvalid to split alg-confusion (→ ErrInvalidToken) from real bad-signature (→ ErrTokenRejected) because golang-jwt wraps both under that one sentinel. Three internal keyFunc message-matches retained — those are strings WE control, not the library's. initializ#2 gcp_iap JWKS merge-on-success — switched j.keys = newKeys to a per-kid merge. A partial-but-valid JWKS response (e.g. one kid accidentally omitted by GCP during rotation) no longer drops kids the stale-grace contract assumes we still have. Worst case is keeping a retired kid in cache; verification still fails naturally for any token signed with the retired private key. initializ#3 azure_ad GraphCache defensive copies — Get returns append([]string(nil), e.groups...) and Put stores a copy of its input. Caller mutating Identity.Groups (the auth.Identity layer treats it as freely mutable) can't poison the cache. initializ#4 forge-cli needsYAMLQuoting numeric edge cases — quote anything that resembles a YAML number (hex 0x, octal 0o, binary 0b, leading-zero "010", scientific 1e10, decimal float 3.14, signed ±N, .inf / .nan in either case). Auth-setting values rarely hit these shapes but the docstring promised "false negatives are bugs" and the Web UI POST path can supply arbitrary strings. Added looksNumeric() helper with separate allHexDigits / allOctalDigits / allBinaryDigits gates. initializ#5 aws_sigv4 identity_cache_test — replaced string(rune(i)) with strconv.Itoa(i). Surrogate code points (0xD800..0xDFFF) all map to U+FFFD, so the eviction-threshold test was silently building ~10 distinct keys instead of 10_001 and the sweep never ran. initializ#6 http.NewRequestWithContext error handling — fixed the two `req, _ := ...` antipatterns in gcp_iap/iap_jwks.go and azure_ad/graph_client.go. Hardcoded URLs make the failure currently unreachable, but errcheck-clean is the discipline. initializ#7 gcp_iap HS256-with-EC-public-key alg-confusion test — pinned the most dangerous attack shape: attacker fetches the verifier's public key from JWKS (open by design), uses raw X/Y bytes as the HMAC "secret", signs an HS256 token. A non-whitelisting verifier would HMAC-verify it. Our keyFunc rejects on alg != "ES256" BEFORE key lookup; this test confirms. Tests added: TestGraphCache_GetReturnsDefensiveCopy, TestGraphCache_PutStoresDefensiveCopy, TestProvider_HS256WithECPublicKeyAsSecret_Rejected. Existing TestProvider_RS256Token_Rejected still passes (alg-confusion still classified as ErrInvalidToken under the new sentinel-based path). 42 packages green, lint + gofmt clean.

Ran the /sync-docs recipe over PR initializ#79's surface (commit 745c024). Two pre-flight fixes before the sync: - .claude/commands/sync-docs.md mapping table was outdated: pointed at flat files (`docs/commands.md`, `docs/runtime.md`, …) that don't exist; real layout is nested under core-concepts/, reference/, security/. Rewrote the table to match the actual tree and added a `forge-core/auth/` → authentication.md + audit-logging.md row. - docs/auth/ is gitignored (per earlier review feedback). Did NOT move that content into git; the new canonical home is docs/security/authentication.md inside the existing tracked docs/security/ folder. New doc: docs/security/authentication.md (423 lines) — single home for the full auth provider chain across Phase 1 + Phase 2. Covers: - Provider matrix and chain semantics (first-match-wins, fail-closed on reject, loopback static_token auto-prepend) - Per-provider sections for static_token, oidc, aws_sigv4, gcp_iap, azure_ad, http_verifier — including yaml shape, wire format, client-side recipe, security model - aws_sigv4 client-side gotcha (boto3.generate_presigned_url doesn't work for STS; use SigV4QueryAuth) - AWS Org-wide trust recipes (Identity Center, entry role + aws:PrincipalOrgID condition) - Egress allowlist auto-extension table per provider - Wizard / CLI examples - Mesh patterns (single-account fleet, per-pair allowlist) Updates to satellite docs: docs/security/overview.md - Added Authentication layer to the security-architecture ASCII diagram (between Guardrails and Egress) - New "Authentication" section with provider table + chain rules - Authentication row in Related Documentation footer docs/security/audit-logging.md - auth_verify and auth_fail event entries in the Event Types table - Full example shape for both events - Reason-code table (missing_token, not_for_me, rejected, invalid, provider_unavailable) with operator actions - token_kind value table (empty, opaque, jwt, sigv4, iap_jwt) - Audit pipeline jq grep recipes for common queries docs/security/egress-control.md - New "Auth-provider domain auto-extension" subsection under Allowlist Resolution with the per-provider host table - Notes the wizard's Auth-before-Egress ordering docs/reference/forge-yaml-schema.md - Full auth: block inserted in the schema example with all 6 provider types and every documented settings key docs/reference/cli-reference.md - All --auth-* flags added to the forge init flag table - Two new example invocations (aws_sigv4 with allowed-account, azure_ad multi-tenant with allowlist) - Link to docs/security/authentication.md docs/reference/web-dashboard.md - Auth step added to the wizard steps table (between Fallback and Env & Security) - New "Auth step" subsection listing all 7 picker options + the Web UI's settings-key filter behavior README.md - Authentication row added to the Security documentation table Broken-link scan: zero broken links in tracked docs/. 42 packages still pass go test -race; lint + gofmt clean. No code changes in this commit — docs only.

naveen-kurra · 2026-05-24T20:04:32Z

@naveen-kurra also run /sync-docs command/plugin for Claude to sync the document updates based on the changes

DOne

initializ-mk · 2026-05-24T20:16:19Z

Re-review of fix commits — LGTM, ready to merge

Verified all 8 fix commits against the prior review. Tests pass with -race across 9 packages, lint + gofmt clean.

BLOCKERs — all cleared (`774268d`)

B1 azure_ad/graph_client.go:166-181 — ensureGraphHost now takes configuredScheme and requires EqualFold(scheme) match. endpointScheme pre-parsed at construction. The http:// bearer-downgrade attack is closed.
B2 azure_ad/graph_client.go:54-67 — CheckRedirect: func(...) error { return http.ErrUseLastResponse } with inline comment naming the threat model. Graph's app-layer nextLink replaces the need for HTTP redirects.
B3 aws_sigv4/sts_client.go:39-48 — same CheckRedirect pinning with comment explicitly enumerating threats (MITM, TLS-inspecting proxy, DNS hijack).
Bonus gcp_iap/iap_jwks.go — author proactively pinned the same on the JWKS client even though I didn't flag it. Same blast radius (attacker JWKS → forged tokens verify); correct defensive call.

MAJORs — all addressed

M1 674bf2f — u.User != nil check added BEFORE the host check in sigv4_parser.go:88. TestParseToken_RejectsUserinfo pins it. Comment explains why STS ignores Basic auth but bytes still shouldn't leave the box.
M2 c6e0aff — PresignedToken gains SigTime / Expires; parser requires both query params; CheckFreshness(now, maxExpires, skew) called in Provider.Verify BEFORE the cache lookup. Config gains MaxTokenExpires (default 15min) + ClockSkew (default 5min). Provider.now func() time.Time is test-injectable. 10 new tests cover happy-path, expired, future-token, exceeds-cap, edge-of-skew.
M3 419f20f — AllowedTenants []string added with three explicit modes (single-tenant / multi+allowlist / multi+empty="any-tenant globally"). Factory rejects allowed_tenants in single-tenant mode (catches typo). Case-insensitive match. Validator emits warning on the "any-tenant globally" shape so it's loud. --auth-azure-allowed-tenant flag (repeatable). 5 tests. resolveIssuer docstring rewritten to spell out all three modes.
M4 57d12a5 — refineTokenKind(structural, identity.Source) runs post-verify; gcp_iap → iap_jwt regardless of what Bearer was on the wire. Failure paths still use structural kind (no identity to read). Counter-test confirms oidc source does NOT trigger upgrade.
M5 ca92921 — KnownAuthProviderSettings closed whitelist per provider type. ValidateAuthConfig emits warning per unknown key. FilterKnownSettings exported as defense-in-depth filter, wired into handlers_create.go BEFORE validation/scaffolding. End-to-end test TestHandleCreateAgent_FiltersUnknownAuthSettings asserts evil_key POST doesn't survive into the scaffold.

NITs — all 7 cleared (`745c024`)

gcp_iap classifyJWTErr now uses errors.Is(err, jwt.ErrTokenExpired) etc. with a thoughtful special case for ErrTokenSignatureInvalid (which wraps BOTH alg-confusion and real bad-signature in golang-jwt v5). Substring matches retained only for keyFunc messages WE control.
gcp_iap JWKS now per-kid merge on success — partial-but-valid rotation responses don't drop in-flight kids.
azure_ad GraphCache Get/Put defensive copy via append([]string(nil), ...).
needsYAMLQuoting extended with looksNumeric() — hex (0x), octal (0o), binary (0b), leading-zero (010), scientific (1e10), float (3.14), signed (±N), .inf/.nan in either case.
identity_cache_test switched to strconv.Itoa(i) — eviction-threshold test now actually reaches 10,001 distinct keys (was collapsing surrogates to U+FFFD before).
Two req, _ := http.NewRequestWithContext(...) antipatterns fixed (gcp_iap + azure_ad).
New TestProvider_HS256WithECPublicKeyAsSecret_Rejected pins the most dangerous alg-confusion shape (attacker HMACs with the public key bytes from JWKS).

Gates

go test -race -count=1 — 9 packages green (auth + all 7 providers + security + validate + cli/cmd + cli/tui/steps + forge-ui)
golangci-lint — clean across all 3 modules
gofmt — clean

Verdict

LGTM — ready to merge. Every blocker, major, and nit from the prior review has a focused fix commit with regression tests. The B1-B3 commit's preemptive extension to gcp_iap/iap_jwks.go and the M3 commit's three-mode explicit operational semantic show good adversarial-thinking discipline. Nicely done.

…uthenticator) Phase 2 PR 2's original "reflect Sigv4 headers" design was broken in the obvious way: Sigv4 binds its signature to the destination host as part of the canonicalized signing input. Headers signed for Forge's host could not be replayed against STS — STS sees host:sts.<region>. amazonaws.com, recomputes the signature, gets a different hash, rejects with "SignatureDoesNotMatch". Caught during real-AWS smoke; documented in PR #79 description. This commit replaces the pattern with the same approach aws-iam-authenticator uses for EKS: Client (3 lines): url = boto3.client('sts').generate_presigned_url('get_caller_identity', ExpiresIn=900) token = 'forge-aws-v1.' + base64.urlsafe_b64encode(url.encode()).rstrip(b'=').decode() requests.post(forge_url, headers={'Authorization': f'Bearer {token}'}, ...) Server: Authorization: Bearer forge-aws-v1.<base64-of-presigned-sts-url> → decode + validate host (SSRF guard) + GET on the URL → STS → identity Net effect on caller experience: identical to JWT/OIDC/azure_ad — "mint token, send Bearer, done." Three lines of client code, hidden in ~15 lines of any AWS SDK in any language. what changed: forge-core/auth/providers/aws_sigv4/ sigv4_parser.go — was parsing AWS4-HMAC-SHA256 Authorization header now parses forge-aws-v1.<base64-url> Bearer tokens (URL host validation, SSRF guard, X-Amz-Credential parsing for cache key derivation) sts_client.go — was POST with reflected headers now GET on the pre-signed URL; same 200/4xx/5xx classification and 64 KiB body cap provider.go — Verify() now reads the Bearer token (not raw headers); SSRF guard via expectedHost field; same cache + ARN allowlist semantics forge-core/auth/ provider.go — HeadersFromRequest reverts X-Amz-Date and X-Amz-Security-Token (no longer needed); keeps X-Goog-Iap-Jwt-Assertion for gcp_iap provider.go — TokenKind detects "forge-aws-v1." prefix → "sigv4" (was: "AWS4-HMAC-SHA256 " on raw Authorization) middleware.go — simplify: empty-Bearer fallback only handles IAP (aws_sigv4 rides standard Bearer flow now) scripts/forge-aws-sign.py — rewrite as a clean reference client. --token-only: print just the token for use with curl/other tools Otherwise: do the round-trip POST and print the response CHANGELOG.md — replace "client wrapper required" friction note with the 3-line happy path snippet what stays unchanged: - forge.yaml shape (still type: aws_sigv4, region:, allowed_principals:) - identity_cache.go, arn_matcher.go (cache and authz logic untouched) - security.AuthDomains (sts.<region>.amazonaws.com derivation) - forge-cli/cmd/init* flag set and renderer - validate.ValidateAuthConfig (region still required) - forge-ui/handlers_create.go (AuthProviderTypeMeta entry) Tests: 42 packages pass, golangci-lint v2.10.1 clean, gofmt clean, no aws-sdk-go imports (decision §9.1 still holds). Net diff: +732 / -625 lines (mostly test rewrites; ~80 LOC net less in the provider package because the new flow is structurally simpler).

Phase 2 review (PR #79) flagged three same-root-cause issues across the outbound HTTP clients in azure_ad, aws_sigv4, and gcp_iap: - B1: ensureGraphHost accepted http:// nextLinks — caller's Bearer would leak in plaintext (Go strips Authorization on cross-host redirect but NOT on https→http same-host downgrade). - B2: Graph client used Go's default redirect policy (follow up to 10). ensureGraphHost only validates @odata.nextLink, not HTTP 301/302/307. A redirect from Graph → attacker URL would let the attacker control the JSON body that becomes Identity.Groups (latent until Phase 4 authz lands; closing at the boundary now). - B3: aws_sigv4's STS client had the same default-redirect issue. The parser-side same-host gate covers only the first hop; a redirect off sts.<region>.amazonaws.com would let attacker bytes become the parsed STS XML and control Identity.UserID/OrgID/Arn. Verified all three are real (Go default CheckRedirect follows up to 10 per net/http docs). Smoking gun confirmed in five outbound clients across forge-core/auth/providers/: aws_sigv4/sts_client.go ← fixed (B3) gcp_iap/iap_jwks.go ← fixed (not in review; same pattern, same blast radius — attacker JWKS means forged tokens verify) azure_ad/graph_client.go ← fixed (B1+B2) oidc/provider.go ← Phase 1 code, NOT touched here; OIDC discovery sometimes legitimately redirects, needs design discussion. Filed as follow-up on issue #80. httpverifier/provider.go ← Phase 1 code, NOT touched here; operator-configured URL may be behind LB. Follow-up on issue #80. Fix shape (uniform across the three Phase 2 providers): http.Client{ Timeout: timeout, CheckRedirect: func(*http.Request, []*http.Request) error { return http.ErrUseLastResponse }, } These three endpoints (STS GetCallerIdentity, IAP JWKS, Graph transitiveMemberOf) NEVER legitimately issue 3xx — the auth contract each provider validates is bound to the configured host. Returning ErrUseLastResponse causes the 3xx response to flow into our existing status-code switch, where the "unexpected status" default arm maps it to ErrProviderUnavailable (and the audit logs "provider_unavailable", not a misleading "rejected"). B1 specific (separate from the redirect-policy change): ensureGraphHost now checks both Host AND Scheme. GraphClient stores endpointScheme alongside endpointHost, pre-parsed at construction so the per-page check stays cheap. Test mode (httptest's http://) still works because the configured scheme is matched against, not hard- coded https. Regression tests added (one per provider): TestSTSClient_DoesNotFollowRedirects TestJWKSCache_DoesNotFollowRedirects TestGraphClient_DoesNotFollowRedirects Each spins up an httptest server that returns 302 → attacker URL, runs the client, asserts: - error returned (not silently followed) - error is ErrProviderUnavailable (correct audit reason) - endpoint hit exactly once (counts redirect-follows) Plus two scheme-specific cases for B1: TestEnsureGraphHost_RejectsSchemeDowngrade — https→http same-host TestEnsureGraphHost_TestModeHTTPOK — httptest http stays OK 42 packages pass go test -race -count=1; golangci-lint v2.10.1 clean; gofmt clean.

Batch-clearing the "don't block merge" follow-ups from review of PR #79. #1 gcp_iap classifyJWTErr — use jwt v5 sentinels via errors.Is rather than substring matching (library wording shifts across patches; sentinels are public API). Special-case ErrTokenSignatureInvalid to split alg-confusion (→ ErrInvalidToken) from real bad-signature (→ ErrTokenRejected) because golang-jwt wraps both under that one sentinel. Three internal keyFunc message-matches retained — those are strings WE control, not the library's. #2 gcp_iap JWKS merge-on-success — switched j.keys = newKeys to a per-kid merge. A partial-but-valid JWKS response (e.g. one kid accidentally omitted by GCP during rotation) no longer drops kids the stale-grace contract assumes we still have. Worst case is keeping a retired kid in cache; verification still fails naturally for any token signed with the retired private key. #3 azure_ad GraphCache defensive copies — Get returns append([]string(nil), e.groups...) and Put stores a copy of its input. Caller mutating Identity.Groups (the auth.Identity layer treats it as freely mutable) can't poison the cache. #4 forge-cli needsYAMLQuoting numeric edge cases — quote anything that resembles a YAML number (hex 0x, octal 0o, binary 0b, leading-zero "010", scientific 1e10, decimal float 3.14, signed ±N, .inf / .nan in either case). Auth-setting values rarely hit these shapes but the docstring promised "false negatives are bugs" and the Web UI POST path can supply arbitrary strings. Added looksNumeric() helper with separate allHexDigits / allOctalDigits / allBinaryDigits gates. #5 aws_sigv4 identity_cache_test — replaced string(rune(i)) with strconv.Itoa(i). Surrogate code points (0xD800..0xDFFF) all map to U+FFFD, so the eviction-threshold test was silently building ~10 distinct keys instead of 10_001 and the sweep never ran. #6 http.NewRequestWithContext error handling — fixed the two `req, _ := ...` antipatterns in gcp_iap/iap_jwks.go and azure_ad/graph_client.go. Hardcoded URLs make the failure currently unreachable, but errcheck-clean is the discipline. #7 gcp_iap HS256-with-EC-public-key alg-confusion test — pinned the most dangerous attack shape: attacker fetches the verifier's public key from JWKS (open by design), uses raw X/Y bytes as the HMAC "secret", signs an HS256 token. A non-whitelisting verifier would HMAC-verify it. Our keyFunc rejects on alg != "ES256" BEFORE key lookup; this test confirms. Tests added: TestGraphCache_GetReturnsDefensiveCopy, TestGraphCache_PutStoresDefensiveCopy, TestProvider_HS256WithECPublicKeyAsSecret_Rejected. Existing TestProvider_RS256Token_Rejected still passes (alg-confusion still classified as ErrInvalidToken under the new sentinel-based path). 42 packages green, lint + gofmt clean.

Naveen Kurra added 12 commits May 23, 2026 19:19

naveen-kurra requested a review from initializ-mk May 24, 2026 15:58

Naveen Kurra added 2 commits May 24, 2026 12:02

naveen-kurra mentioned this pull request May 24, 2026

Phase 3: production hardening + cross-cloud parity (follow-up to #79) #80

Open

8 tasks

Naveen Kurra added 8 commits May 24, 2026 15:16

initializ-mk merged commit ea8922b into initializ:main May 24, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(auth): Phase 2 — AWS Sigv4, GCP IAP, Azure AD providers (v0.11.0)#79

feat(auth): Phase 2 — AWS Sigv4, GCP IAP, Azure AD providers (v0.11.0)#79
initializ-mk merged 22 commits into
initializ:mainfrom
naveen-kurra:pr2-cloud-native-auth-providers

naveen-kurra commented May 24, 2026 •

edited

Loading

Uh oh!

initializ-mk commented May 24, 2026

Uh oh!

initializ-mk commented May 24, 2026

Uh oh!

naveen-kurra commented May 24, 2026

Uh oh!

initializ-mk commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

naveen-kurra commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters

Design pivot during PR review — aws_sigv4 switched to pre-signed URL pattern

What was wrong (original design — commits 9a1ebae through 382294e)

What's correct (current design — commit 8568535)

Why this is the right design (not just a fix)

Locked design decisions (unchanged through the pivot)

What clients write (the actual 3 lines)

✅ Real-AWS validation (live AWS account)

Audit log emitted (Test #2 success path)

Two bugs caught during live testing (both fixed in 8568535)

Other layers also exercised live

What was NOT live-tested (and why)

What landed (12 commits)

Phase 1 compatibility

Security model highlights

Known deferred work

Test plan

Design artifacts (offline)

Uh oh!

initializ-mk commented May 24, 2026

Phase 2 review — three providers, focused on the outbound-HTTP defense-in-depth gaps

BLOCKERS (please fix before merge)

MAJOR (should land in this PR or a fast follow-up)

NITs / follow-ups (don't block merge)

Verdict

Uh oh!

initializ-mk commented May 24, 2026

Uh oh!

naveen-kurra commented May 24, 2026

Uh oh!

initializ-mk commented May 24, 2026

Re-review of fix commits — LGTM, ready to merge

BLOCKERs — all cleared (774268d)

MAJORs — all addressed

NITs — all 7 cleared (745c024)

Gates

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

naveen-kurra commented May 24, 2026 •

edited

Loading

Design pivot during PR review — `aws_sigv4` switched to pre-signed URL pattern

What was wrong (original design — commits `9a1ebae` through `382294e`)

What's correct (current design — commit `8568535`)

Two bugs caught during live testing (both fixed in `8568535`)

BLOCKERs — all cleared (`774268d`)

NITs — all 7 cleared (`745c024`)