feat: antigravity invalidate oauth when failed#3499
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the Antigravity provider with TLS skip-verify support and default project ID options, while implementing a lazy token refresh mechanism using singleflight. It also improves thread safety in management handlers and fixes several incorrect pointer initializations. The authentication manager's error handling was refactored to better manage quota cooldowns. Review feedback indicates that access tokens are being invalidated too aggressively on non-authentication errors, which could lead to excessive OAuth traffic; it is recommended to restrict this invalidation to 401 status codes.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces several enhancements to the authentication and configuration handling, specifically for the Antigravity provider. It adds support for skipping TLS verification and using a default project ID. The core Manager has been updated to support lazy, on-demand token refreshing with singleflight to prevent redundant operations, and it now synchronizes refreshed credentials back to the main auth store. Error handling in MarkResult was also improved to manage token eviction more precisely. Feedback from the review focused on performance optimizations, such as avoiding full struct copies when accessing configuration and moving regex compilation to the package level.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8266fa3852
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces on-demand (lazy) authentication refreshing using singleflight to coalesce concurrent requests and adds a mechanism to synchronize tokens refreshed by executors back to the central manager. It also implements a tls-skip-verify configuration for outbound connections, refines error handling to preserve access tokens during transient network or 404 errors, and simplifies quota backoff logic by parsing reset durations directly from error messages. Additionally, several concurrency fixes were applied, such as transitioning to RWMutex in management handlers and fixing potential pointer issues with local variables. Feedback focuses on preventing a potential nil pointer panic when accessing metadata, ensuring consistent token eviction logic across error handlers, and utilizing the retryAfter parameter as a fallback for quota cooldowns.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b7bc437f2a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces several enhancements to the authentication and configuration systems. Key updates include the implementation of lazy, on-demand OAuth token refreshing using singleflight to prevent redundant requests, and the addition of a tls-skip-verify option for outbound connections. Concurrency in management handlers has been improved by adopting RWMutex, and several potential pointer-related bugs were addressed. Additionally, the MarkResult logic was refined to evict cached tokens on most HTTP errors and better parse quota reset durations. Comprehensive integration tests for the authentication conductor were also added. Feedback suggests ensuring that the auth object is consistently cloned in PrepareHttpRequest to avoid side effects when it is not found in the manager's map.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d469e0f29a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
/gemini review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1581734c92
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements to the Antigravity provider and the core authentication management system. Key enhancements include the implementation of a tls-skip-verify configuration for outbound connections, a lazy on-demand OAuth refresh mechanism utilizing singleflight to coalesce concurrent requests, and refined token eviction logic that clears cached access tokens on most HTTP errors. The PR also optimizes concurrency by transitioning to RWMutex in management handlers and adds extensive integration tests for the authentication conductor. Feedback focuses on ensuring that retry and quota recovery timestamps are explicitly reset when cooling is disabled to prevent stale cooldown states from persisting.
| if !quotaCooldownDisabledForAuth(auth) { | ||
| cooldown := defaultQuotaCooldown | ||
| if resultErr != nil { | ||
| if parsed, ok := parseQuotaResetDuration(resultErr.Message); ok { | ||
| cooldown = parsed | ||
| } else if retryAfter != nil && *retryAfter > 0 { | ||
| cooldown = *retryAfter | ||
| } | ||
| auth.Quota.BackoffLevel = nextLevel | ||
| } else if retryAfter != nil && *retryAfter > 0 { | ||
| cooldown = *retryAfter | ||
| } | ||
| next := now.Add(cooldown) | ||
| auth.NextRetryAfter = next | ||
| auth.Quota.NextRecoverAt = next | ||
| } |
There was a problem hiding this comment.
In applyAuthFailureState, if cooling is disabled, auth.NextRetryAfter and auth.Quota.NextRecoverAt should be reset to zero values to maintain consistency with the previous logic and avoid stale cooldowns.
| if !quotaCooldownDisabledForAuth(auth) { | |
| cooldown := defaultQuotaCooldown | |
| if resultErr != nil { | |
| if parsed, ok := parseQuotaResetDuration(resultErr.Message); ok { | |
| cooldown = parsed | |
| } else if retryAfter != nil && *retryAfter > 0 { | |
| cooldown = *retryAfter | |
| } | |
| auth.Quota.BackoffLevel = nextLevel | |
| } else if retryAfter != nil && *retryAfter > 0 { | |
| cooldown = *retryAfter | |
| } | |
| next := now.Add(cooldown) | |
| auth.NextRetryAfter = next | |
| auth.Quota.NextRecoverAt = next | |
| } | |
| if !quotaCooldownDisabledForAuth(auth) { | |
| cooldown := defaultQuotaCooldown | |
| if resultErr != nil { | |
| if parsed, ok := parseQuotaResetDuration(resultErr.Message); ok { | |
| cooldown = parsed | |
| } else if retryAfter != nil && *retryAfter > 0 { | |
| cooldown = *retryAfter | |
| } | |
| } else if retryAfter != nil && *retryAfter > 0 { | |
| cooldown = *retryAfter | |
| } | |
| next := now.Add(cooldown) | |
| auth.NextRetryAfter = next | |
| auth.Quota.NextRecoverAt = next | |
| } else { | |
| auth.NextRetryAfter = time.Time{} | |
| auth.Quota.NextRecoverAt = time.Time{} | |
| } |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5284d2225
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26c6b5402b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Adds a `tls-skip-verify` config option that disables TLS certificate verification for outbound Antigravity connections (OAuth token exchange and API requests). Fixes the order of proxy/TLS configuration so that tls-skip-verify is layered on top of proxy transport rather than being overwritten by it. Changes: - internal/config/sdk_config.go: add TLSSkipVerify field - internal/auth/antigravity/auth.go: apply proxy first, then TLS skip verify - internal/runtime/executor/antigravity_executor.go: skipTLSVerifyTransport helper - config.example.yaml: document the new option
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ersist in map pickNextMixed and pickNextMixedLegacy were cloning the selected auth before returning it to the executor. This meant on-demand OAuth token refresh (ensureAccessToken → refreshToken) wrote the fresh token and expiry into a throwaway clone — the original m.auths entry was never updated, causing a new OAuth call on every LLM request. By returning the original pointer, executor mutations to auth.Metadata (such as access_token, expired, expires_in) are visible in the map for subsequent calls.
…en cache persists pickNextMixed/pickNextMixedLegacy return scheduler clones, so on-demand OAuth token refresh via ensureAccessToken → refreshToken wrote the fresh access_token+expiry into a throwaway clone — the map entry was never updated, causing a new OAuth call on every LLM request. Add syncAuthMetadata() which copies Metadata (access_token, expired, expires_in, timestamp) from the per-request auth back to m.auths under the map lock. Call it after every Execute, ExecuteStream, and CountTokens so the next pick clones a map entry with fresh token data.
…nfig updateAntigravityCreditsBalance in refreshToken was always running after OAuth refresh, issuing a POST to /v1internal:loadCodeAssist unconditionally. Gate it behind the same antigravity-credits config that already guards the credits hint path in maybeRefreshAntigravityCreditsHint. Set antigravity-credits: false in config.yaml to skip both.
…cutor syncAuthMetadata now calls RefreshSchedulerEntry after copying token metadata to the map entry. Without this, the scheduler holds stale snapshot clones and pickNextMixed returns a throwaway with old token/expiry — defeating the cache and causing an OAuth call on every request.
…ging for OAuth cache - New config `antigravity-use-default-project-id` (default false) skips the ensureAntigravityProjectID / loadCodeAssist call when no project ID is needed - Debug log in ensureAccessToken showing token/expiry/skew math to diagnose why OAuth fires on every call - Debug log in syncAuthMetadata confirming metadata keys are copied back to the map
… code consistency
…efresh Replace syncAuthMetadata (post-execute sync, stale window) with refreshAuthIfNeeded in all pickNext variants — refresh happens before cloning via singleflight.Group. Restore StartAutoRefresh in service.go for non-Antigravity providers.
…n tests - Remove auth.Clone() in ensureAccessToken so refreshToken modifies in-place - Add syncRefreshedAuth() to persist refreshed tokens back to m.auths after execution - Restore 429 cooldown parsing with parseQuotaResetDuration (1h fallback) - Limit model-level token deletion to 401 only (applyAuthFailureState unchanged) - Add 7 integration tests covering token caching, error recovery, auto-refresh loop stress/idle/expiry, and concurrency - Update test URLs from cloudcode-pa to daily-cloudcode-pa - Update base URL fallback order (daily, sandbox)
MarkResult previously deleted the cached access_token on every error, including 404 (model not found) and network errors (status 0) which are unrelated to token validity. This caused unnecessary OAuth exchanges and potential rate limiting from providers. Now the token is only evicted on actual HTTP errors (4xx/5xx except 404), and a fresh token resolves any resource cap imposed on the old one. Also removes the redundant delete inside the 401 case that was already covered by the unconditional delete.
…t regex to package-level variable
The 429 branch now checks result.RetryAfter as a fallback when the error message doesn't contain a parseable duration. This prevents excessively long cooldowns when upstreams provide a valid Retry-After header but no "after 1h2m" text in the message. Priority: parseable message -> Retry-After header -> default 1h
… lazy auth refresh to prevent token failures
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…hFailureState Match MarkResult's token eviction logic: skip access_token deletion for 404 and network errors (status 0) which are unrelated to token validity. Honor retryAfter parameter as fallback for 429 cooldowns when parseQuotaResetDuration fails to extract a duration from the error message.
…uest Add 30-minute cooldown for 401/402/403 model failures to prevent repeated unauthorized loops when credentials are revoked. Cooldown respects disable_cooling override. Always clone auth before passing to PrepareRequest to ensure immutability, even when the auth is not found in the manager's map.
- Centralize outbound transport creation with Proxy/TLS support in `proxyutil`. - Fix background services (model updater, version checker) to respect global proxy and TLS settings. - Add `User-Agent` to background fetchers to resolve GitHub handshake timeouts. - Improve Antigravity quota detection: preserve tokens on long-term exhaustion (>30s) vs transient 429s. - Skip base URL fallback (sandbox) for long quota exhaustion (>30s); transient capacity issues still fallback. - Update `parseQuotaResetDuration` regex to match space-preceded durations (e.g., "Resets in 143h58m14s") without requiring "after" prefix. - Add `MarkProxyRejected` to filter proxy-level rejections from error log files. - Log proxy-rejected 429s at info level with `[proxy-rejected]` tag instead of warn. - Improve `shouldAttemptAntigravityCreditsFallback` debug logs with decision reasons. - Handle `model_cooldown` error code in `enrichAuthSelectionError` and `isProxyRejection`. - Add tests for quota exhaustion fallback, token preservation, regex parsing, and proxy rejection.
…t, snapshot config Add 30-minute cooldown for 401/402/403 model failures to prevent repeated unauthorized loops when credentials are revoked. Cooldown respects disable_cooling override. Always pass auth (not clone) to PrepareRequest, then persist refreshed state back to manager's map so in-place credential rotations are kept. Snapshot config (including map fields) under lock before JSON encoding to prevent concurrent map iteration panics. Add explicit else branches to zero NextRetryAfter/NextRecoverAt when cooling is disabled for 401/402/403/429 cases. Add comment explaining no cooldown for transient 50x errors is intentional since OAuth refresh resolves them.
…refreshed auth metadata Config.Snapshot() previously shallow-copied key structs (GeminiKey, CodexKey, ClaudeKey, OpenAICompatibility, VertexCompatKey), so nested mutable fields (Headers maps, Models/ExcludedModels/APIKeyEntries slices, CloakConfig pointers, ThinkingSupport pointers) still aliased live cfg data after the lock was released. Management patch handlers mutate those nested collections in place, causing race conditions during JSON encoding. Add deep-copy helpers for each key type that recursively clone all slice, map, and pointer fields, and wire them into Snapshot(). syncRefreshedAuth() short-circuited on access_token equality, skipping sync of refresh_token, expiry, and other metadata. OAuth refresh responses may rotate refresh_token or extend expiry while reusing the same access_token; the early return caused stale metadata and redundant refreshes. Remove the comparison so all refreshed fields are always persisted.
ad3fa87 to
6d09b38
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a tls-skip-verify configuration option across multiple modules, including Antigravity, Claude, and Gemini executors, to support environments with intercepting proxies. It also implements a Snapshot method for the configuration to ensure thread-safe access during JSON encoding. The authentication conductor has been refactored to support lazy on-demand refreshes using singleflight and more sophisticated token eviction logic that avoids clearing tokens during long-term quota exhaustion or network errors. Review feedback suggests refining the token eviction logic to avoid breaking static credentials that do not support refresh and improving the regex for parsing quota reset durations to handle cases where the duration is not preceded by whitespace.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fb07056d69
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for skipping TLS certificate verification across outbound clients and background updaters to accommodate corporate proxy environments. It implements a Snapshot method for the configuration to prevent data races during JSON encoding and transitions several handlers to use RWMutex for improved concurrency. Significant updates were made to the auth Manager and Antigravity executor, including lazy on-demand token refreshing via singleflight, granular handling of quota exhaustion errors, and mechanisms to persist refreshed credentials back to the manager. Feedback identified several critical issues, including data races when accessing shared configuration or auth pointers, the loss of rotated credentials during request preparation, and potential nil pointer dereferences when accessing auth metadata.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5686504d49
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b35891d406
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c1bb11424
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for disabling TLS certificate verification (tls-skip-verify) across various outbound connections and adds a configuration option to use a default project ID for Antigravity. It implements a Snapshot method for the configuration to prevent concurrent map access panics and introduces a lazy on-demand OAuth refresh mechanism using singleflight to coalesce concurrent requests. Additionally, it refines error handling by distinguishing between transient server capacity issues and long-term account quota exhaustion, preventing unnecessary fallbacks and improving logging for proxy-level rejections. Feedback includes concerns regarding the accidental disabling of production URLs, a shallow copy bug in the amp module, potential shallow copies in the configuration snapshotting logic, and a restrictive regex for parsing quota reset durations.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aec8e4771d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _, err, _ := m.lazyRefreshGroup.Do(id, func() (any, error) { | ||
| m.refreshAuth(context.Background(), id) | ||
| return nil, nil |
There was a problem hiding this comment.
Respect request cancellation during lazy token refresh
The new lazy refresh path runs on the request critical path but calls refreshAuth(context.Background(), id), which discards the caller context. If provider refresh is slow or stuck, the request cannot be canceled at its deadline and all concurrent requests for the same auth will also block behind singleflight, causing avoidable tail-latency spikes and worker starvation under upstream incidents.
Useful? React with 👍 / 👎.
| if transport == nil { | ||
| transport = cloneDefaultTransport() | ||
| } |
There was a problem hiding this comment.
Keep inherit-proxy mode from creating per-client transports
When proxyURL is empty (ModeInherit), BuildTransport now clones and returns a new transport instead of leaving transport unset. Since SetProxy assigns that transport, callers that create short-lived clients now get a unique connection pool each time, which defeats shared keep-alive reuse and increases goroutine/FD churn. In inherit mode, transport should remain unchanged unless TLS-skip-verify actually needs an override.
Useful? React with 👍 / 👎.
No description provided.