-
Notifications
You must be signed in to change notification settings - Fork 177
Add Redis Storage Backend for Auth Server #3639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3639 +/- ##
==========================================
+ Coverage 65.86% 66.03% +0.17%
==========================================
Files 413 415 +2
Lines 40953 41623 +670
==========================================
+ Hits 26974 27487 +513
- Misses 11891 11985 +94
- Partials 2088 2151 +63 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Implement RedisStorage that satisfies the Storage interface to enable horizontal scaling of ToolHive auth servers. Multiple instances can now share authentication state via Redis with automatic Sentinel failover. Key features: - Redis Sentinel support for high availability deployments - ACL user authentication with credentials from environment variables - Multi-tenant key prefix with hash tags for Redis cluster slot co-location - Secondary indexes for RFC 7009 token revocation compliance - Automatic expiration via Redis TTL instead of background cleanup - Health checking via Redis PING New files: - pkg/authserver/storage/redis.go - Full Storage interface implementation - pkg/authserver/storage/redis_keys.go - Key generation utilities - pkg/authserver/storage/redis_test.go - Unit tests with miniredis (85% coverage) Dependencies added: - github.com/redis/go-redis/v9 - github.com/alicebob/miniredis/v2 (testing) Closes #3628 Address internal feedback Address internal review feedback Expire public clients
c0e32f5 to
ac645ba
Compare
jhrozek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detailed review of the Redis Sentinel storage backend. 10 inline comments covering session serialization (critical), a bug, and several improvements.
The most important finding is comment #6 on the session serialization — the current approach breaks token refresh because the deserialized session doesn't implement JWTSessionContainer. See the comment for the full explanation and suggested fix.
| // Serialization Helpers | ||
| // ----------------------- | ||
|
|
||
| // marshalRequester serializes a fosite.Requester to JSON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We verified that fosite's handlers call Sanitize() on the request before passing it to storage — same approach as Ory Hydra. The form data here is already stripped of sensitive fields (client_secret, password, code_verifier). Checked all code paths in the auth server: no direct calls to Create* methods bypass fosite's handler chain.
| cfg.WriteTimeout = DefaultWriteTimeout | ||
| } | ||
|
|
||
| client := redis.NewFailoverClient(&redis.FailoverOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no TLS configuration here. Are we planning to add TLS support as a follow-up? For production Sentinel deployments this would be needed.
| return nil, fmt.Errorf("failed to unmarshal upstream tokens: %w", err) | ||
| } | ||
|
|
||
| expiresAt := time.Unix(stored.ExpiresAt, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: when ExpiresAt is zero (upstream tokens with no expiry), time.Unix(0, 0) produces 1970-01-01, so the check on line 792 always returns ErrExpired. The write path in marshalUpstreamTokensWithTTL treats zero as "use default TTL", but the read path here rejects it. Need to either skip the expiry check when stored.ExpiresAt == 0 or store a sentinel value that means "no expiry".
| } | ||
|
|
||
| // GetAuthorizeCodeSession retrieves the authorization request for a given code. | ||
| func (s *RedisStorage) GetAuthorizeCodeSession(ctx context.Context, code string, _ fosite.Session) (fosite.Requester, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a TTL gap between the auth code (10 min) and the invalidation marker (30 min). If the auth code key expires but the invalidation marker is still alive, a replayed code gets ErrNotFound instead of ErrInvalidatedAuthorizeCode. Fosite treats these differently — ErrInvalidatedAuthorizeCode triggers token revocation (the replay attack protection from RFC 6819), while ErrNotFound just fails the request, leaving the previously issued tokens alive.
One fix: check the invalidation marker first (you already do), and if it exists, return ErrInvalidatedAuthorizeCode immediately without trying to GET the auth code key. That way the marker works correctly even after the code itself expires.
| return fmt.Errorf("failed to marshal request: %w", err) | ||
| } | ||
|
|
||
| // Store the token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three Redis operations here (SET, SADD, EXPIRE) are not atomic. If SADD or EXPIRE fails, the compensating deletes try to clean up, but those can also fail leaving the store in an inconsistent state. Consider using a Redis pipeline (TxPipeline) to execute all three as a single round-trip. Same pattern applies to CreateRefreshTokenSession.
| form := url.Values(stored.Form) | ||
|
|
||
| // Create session with expiration times | ||
| session := &redisSession{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
marshalRequester captures all the requester fields correctly, but from the session it only extracts Subject and ExpiresAt (the only data accessible through the fosite.Session interface). On the read side, unmarshalRequester reconstructs a redisSession that implements fosite.Session but not oauth2.JWTSessionContainer.
This breaks token refresh. The flow is: GetRefreshTokenSession loads the stored session -> fosite clones it onto the new request (flow_refresh.go:87) -> PopulateTokenEndpointResponse generates new tokens -> DefaultJWTStrategy.generate() does requester.GetSession().(JWTSessionContainer) (strategy_jwt.go:112). Since redisSession doesn't implement JWTSessionContainer, this type assertion fails and refresh returns an error.
Beyond the type assertion, the JWT claims Extra map (tsid, client_id) and UpstreamSessionID are lost, so even if you worked around the assertion, refreshed tokens would be missing claims and upstream token exchange would break.
The standard fosite pattern (used by Hydra's SQL storage) is to serialize the entire session as a JSON blob: json.Marshal(request.GetSession()) on write, json.Unmarshal(blob, sessionPrototype) on read. The session prototype comes from the fosite.Session parameter that all Get*Session methods receive — currently discarded as _ fosite.Session on lines 297, 392, 484, 624. oauth2.JWTSession round-trips cleanly with Go's default JSON encoding.
The session.UpstreamSession interface (which embeds JWTSessionContainer) and the compile-time check var _ UpstreamSession = (*Session)(nil) already exist in the session package to catch this kind of mismatch. With this fix, redisSession and redisRequester can be removed entirely — you return fosite.Request directly with a real *session.Session that satisfies all the required interfaces.
| } | ||
|
|
||
| // StoreUpstreamTokens stores the upstream IDP tokens for a session. | ||
| func (s *RedisStorage) StoreUpstreamTokens(ctx context.Context, sessionID string, tokens *UpstreamTokens) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream IDP tokens are stored as plaintext JSON. For a first implementation this is fine, but worth noting that Hydra has an optional EncryptSessionData feature that encrypts session blobs before persisting them. Could be worth considering as a follow-up for tokens at rest.
| if len(cfg.SentinelConfig.SentinelAddrs) == 0 { | ||
| return errors.New("at least one sentinel address is required") | ||
| } | ||
| if cfg.ACLUserConfig == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks that ACLUserConfig is non-nil but doesn't validate that Username and Password are non-empty. Passing &ACLUserRunConfig{} with blank strings would silently connect with empty credentials.
| RequestID string `json:"request_id"` | ||
| Subject string `json:"subject"` | ||
| ExpiresAt map[string]int64 `json:"expires_at"` | ||
| AccessTokenExpiry int64 `json:"access_token_expiry,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AccessTokenExpiry, RefreshTokenExpiry, and AuthCodeExpiry are never written or read anywhere. Looks like leftovers from before the ExpiresAt map approach was adopted. Should be removed to avoid confusion.
|
|
||
| // Clean up old user's set if UserID changed | ||
| if oldUserID != "" && oldUserID != newUserID { | ||
| oldUserUpstreamSetKey := redisSetKey(s.keyPrefix, "user:upstream", oldUserID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"user:upstream" and "user:providers" are used as hardcoded strings in 7 places across this file, but redis_keys.go already defines constants for other key types. Should add KeyTypeUserUpstream and KeyTypeUserProviders constants for consistency.
Closes #3628
Summary
Implements a Redis Sentinel-backed storage backend for the authorization server's
Storageinterface, enabling horizontal scaling of ToolHive auth servers. Multiple instances can now share authentication state via Redis with automatic failover support. This is Phase 1 of the Redis Storage feature, providing the core implementation with comprehensive unit tests.Changes Made
Storage Backend (
pkg/authserver/storage/)RedisStoragestruct implementing all 30+ methods of theStorageinterfacestoredSession,storedClient,storedProviderIdentity, etc.) for JSON storageUpdateProviderIdentityLastUsedoperation to prevent race conditionsKey Generation (
pkg/authserver/storage/redis_keys.go)DeriveKeyPrefixfunction using hash tag formatthv:auth:{ns:name}:for Redis Cluster slot co-locationConfiguration (
pkg/authserver/storage/config.go)TypeRedisstorage type constantRedisRunConfigand related types for serializable configuration (Sentinel addresses, ACL credentials, timeouts)Dependencies (
go.mod)github.com/redis/go-redis/v9for Redis client with Sentinel supportgithub.com/alicebob/miniredis/v2for unit testingImplementation Details
{ns:name}combines namespace and name to ensure all keys for a server land in the same Redis Cluster sloturl.Valuesstored asmap[string][]string)Testing
Additional Notes