Skip to content

Conversation

@tgrunnagle
Copy link
Contributor

Closes #3628

Summary

Implements a Redis Sentinel-backed storage backend for the authorization server's Storage interface, enabling horizontal scaling of ToolHive auth servers. Multiple instances can now share authentication state via Redis with automatic failover support. This is Phase 1 of the Redis Storage feature, providing the core implementation with comprehensive unit tests.

Changes Made

Storage Backend (pkg/authserver/storage/)

  • Added RedisStorage struct implementing all 30+ methods of the Storage interface
  • Implemented serialization wrappers (storedSession, storedClient, storedProviderIdentity, etc.) for JSON storage
  • Used Redis TTL for automatic token expiration instead of background cleanup
  • Created secondary indexes for request ID lookups required by RFC 7009 token revocation
  • Added Lua script for atomic UpdateProviderIdentityLastUsed operation to prevent race conditions
  • Implemented compile-time interface compliance check

Key Generation (pkg/authserver/storage/redis_keys.go)

  • Added DeriveKeyPrefix function using hash tag format thv:auth:{ns:name}: for Redis Cluster slot co-location
  • Defined key type constants for all stored data types (access tokens, refresh tokens, auth codes, PKCE, clients, users, etc.)
  • Added helper functions for consistent key generation

Configuration (pkg/authserver/storage/config.go)

  • Added TypeRedis storage type constant
  • Added RedisRunConfig and related types for serializable configuration (Sentinel addresses, ACL credentials, timeouts)
  • Credentials read from environment variables for security

Dependencies (go.mod)

  • Added github.com/redis/go-redis/v9 for Redis client with Sentinel support
  • Added github.com/alicebob/miniredis/v2 for unit testing

Implementation Details

  • Sentinel-only deployment mode enforced (standalone Redis not supported for HA requirements)
  • ACL user authentication required (legacy AUTH not supported for security)
  • Hash tag format {ns:name} combines namespace and name to ensure all keys for a server land in the same Redis Cluster slot
  • TTL set on secondary index sets to prevent memory growth from orphaned indexes
  • Form field serialization preserves multi-value support (url.Values stored as map[string][]string)

Testing

  • Comprehensive unit tests using miniredis (no external Redis required)
  • 85.2% code coverage (exceeds 80% requirement)
  • Tests cover all Storage interface methods, error handling, TTL behavior, and concurrent access
  • Edge cases tested: expired tokens, non-existent resources, duplicate creation, input validation

Additional Notes

  • Future work: Integration tests with real Redis (Phase 2), Operator CRD updates (Phase 3)
  • Parent epic: stacklok/stacklok-epics#197

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Feb 5, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 72.83582% with 182 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.03%. Comparing base (f6ac332) to head (ac645ba).

Files with missing lines Patch % Lines
pkg/authserver/storage/redis.go 72.50% 123 Missing and 59 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3639      +/-   ##
==========================================
+ Coverage   65.86%   66.03%   +0.17%     
==========================================
  Files         413      415       +2     
  Lines       40953    41623     +670     
==========================================
+ Hits        26974    27487     +513     
- Misses      11891    11985      +94     
- Partials     2088     2151      +63     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Feb 5, 2026
Implement RedisStorage that satisfies the Storage interface to enable
horizontal scaling of ToolHive auth servers. Multiple instances can now
share authentication state via Redis with automatic Sentinel failover.

Key features:
- Redis Sentinel support for high availability deployments
- ACL user authentication with credentials from environment variables
- Multi-tenant key prefix with hash tags for Redis cluster slot co-location
- Secondary indexes for RFC 7009 token revocation compliance
- Automatic expiration via Redis TTL instead of background cleanup
- Health checking via Redis PING

New files:
- pkg/authserver/storage/redis.go - Full Storage interface implementation
- pkg/authserver/storage/redis_keys.go - Key generation utilities
- pkg/authserver/storage/redis_test.go - Unit tests with miniredis (85% coverage)

Dependencies added:
- github.com/redis/go-redis/v9
- github.com/alicebob/miniredis/v2 (testing)

Closes #3628

Address internal feedback

Address internal review feedback

Expire public clients
@tgrunnagle tgrunnagle force-pushed the issue_3628_as-redis-token-storage branch from c0e32f5 to ac645ba Compare February 5, 2026 23:55
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Feb 5, 2026
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed review of the Redis Sentinel storage backend. 10 inline comments covering session serialization (critical), a bug, and several improvements.

The most important finding is comment #6 on the session serialization — the current approach breaks token refresh because the deserialized session doesn't implement JWTSessionContainer. See the comment for the full explanation and suggested fix.

// Serialization Helpers
// -----------------------

// marshalRequester serializes a fosite.Requester to JSON.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We verified that fosite's handlers call Sanitize() on the request before passing it to storage — same approach as Ory Hydra. The form data here is already stripped of sensitive fields (client_secret, password, code_verifier). Checked all code paths in the auth server: no direct calls to Create* methods bypass fosite's handler chain.

cfg.WriteTimeout = DefaultWriteTimeout
}

client := redis.NewFailoverClient(&redis.FailoverOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no TLS configuration here. Are we planning to add TLS support as a follow-up? For production Sentinel deployments this would be needed.

return nil, fmt.Errorf("failed to unmarshal upstream tokens: %w", err)
}

expiresAt := time.Unix(stored.ExpiresAt, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: when ExpiresAt is zero (upstream tokens with no expiry), time.Unix(0, 0) produces 1970-01-01, so the check on line 792 always returns ErrExpired. The write path in marshalUpstreamTokensWithTTL treats zero as "use default TTL", but the read path here rejects it. Need to either skip the expiry check when stored.ExpiresAt == 0 or store a sentinel value that means "no expiry".

}

// GetAuthorizeCodeSession retrieves the authorization request for a given code.
func (s *RedisStorage) GetAuthorizeCodeSession(ctx context.Context, code string, _ fosite.Session) (fosite.Requester, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a TTL gap between the auth code (10 min) and the invalidation marker (30 min). If the auth code key expires but the invalidation marker is still alive, a replayed code gets ErrNotFound instead of ErrInvalidatedAuthorizeCode. Fosite treats these differently — ErrInvalidatedAuthorizeCode triggers token revocation (the replay attack protection from RFC 6819), while ErrNotFound just fails the request, leaving the previously issued tokens alive.

One fix: check the invalidation marker first (you already do), and if it exists, return ErrInvalidatedAuthorizeCode immediately without trying to GET the auth code key. That way the marker works correctly even after the code itself expires.

return fmt.Errorf("failed to marshal request: %w", err)
}

// Store the token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three Redis operations here (SET, SADD, EXPIRE) are not atomic. If SADD or EXPIRE fails, the compensating deletes try to clean up, but those can also fail leaving the store in an inconsistent state. Consider using a Redis pipeline (TxPipeline) to execute all three as a single round-trip. Same pattern applies to CreateRefreshTokenSession.

form := url.Values(stored.Form)

// Create session with expiration times
session := &redisSession{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marshalRequester captures all the requester fields correctly, but from the session it only extracts Subject and ExpiresAt (the only data accessible through the fosite.Session interface). On the read side, unmarshalRequester reconstructs a redisSession that implements fosite.Session but not oauth2.JWTSessionContainer.

This breaks token refresh. The flow is: GetRefreshTokenSession loads the stored session -> fosite clones it onto the new request (flow_refresh.go:87) -> PopulateTokenEndpointResponse generates new tokens -> DefaultJWTStrategy.generate() does requester.GetSession().(JWTSessionContainer) (strategy_jwt.go:112). Since redisSession doesn't implement JWTSessionContainer, this type assertion fails and refresh returns an error.

Beyond the type assertion, the JWT claims Extra map (tsid, client_id) and UpstreamSessionID are lost, so even if you worked around the assertion, refreshed tokens would be missing claims and upstream token exchange would break.

The standard fosite pattern (used by Hydra's SQL storage) is to serialize the entire session as a JSON blob: json.Marshal(request.GetSession()) on write, json.Unmarshal(blob, sessionPrototype) on read. The session prototype comes from the fosite.Session parameter that all Get*Session methods receive — currently discarded as _ fosite.Session on lines 297, 392, 484, 624. oauth2.JWTSession round-trips cleanly with Go's default JSON encoding.

The session.UpstreamSession interface (which embeds JWTSessionContainer) and the compile-time check var _ UpstreamSession = (*Session)(nil) already exist in the session package to catch this kind of mismatch. With this fix, redisSession and redisRequester can be removed entirely — you return fosite.Request directly with a real *session.Session that satisfies all the required interfaces.

}

// StoreUpstreamTokens stores the upstream IDP tokens for a session.
func (s *RedisStorage) StoreUpstreamTokens(ctx context.Context, sessionID string, tokens *UpstreamTokens) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream IDP tokens are stored as plaintext JSON. For a first implementation this is fine, but worth noting that Hydra has an optional EncryptSessionData feature that encrypts session blobs before persisting them. Could be worth considering as a follow-up for tokens at rest.

if len(cfg.SentinelConfig.SentinelAddrs) == 0 {
return errors.New("at least one sentinel address is required")
}
if cfg.ACLUserConfig == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks that ACLUserConfig is non-nil but doesn't validate that Username and Password are non-empty. Passing &ACLUserRunConfig{} with blank strings would silently connect with empty credentials.

RequestID string `json:"request_id"`
Subject string `json:"subject"`
ExpiresAt map[string]int64 `json:"expires_at"`
AccessTokenExpiry int64 `json:"access_token_expiry,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AccessTokenExpiry, RefreshTokenExpiry, and AuthCodeExpiry are never written or read anywhere. Looks like leftovers from before the ExpiresAt map approach was adopted. Should be removed to avoid confusion.


// Clean up old user's set if UserID changed
if oldUserID != "" && oldUserID != newUserID {
oldUserUpstreamSetKey := redisSetKey(s.keyPrefix, "user:upstream", oldUserID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"user:upstream" and "user:providers" are used as hardcoded strings in 7 places across this file, but redis_keys.go already defines constants for other key types. Should add KeyTypeUserUpstream and KeyTypeUserProviders constants for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Redis Storage: Core Implementation (Phase 1)

2 participants