Skip to content

Conversation

@benzntech
Copy link

@benzntech benzntech commented Dec 16, 2025

Summary

This PR contains Phase 1 + Phase 2 + Phase 3 of the ArchGW enhancement:

Phase 1: OAuth Gateway (COMPLETED) ✅

Implement OAuth Gateway as a new microservice supporting Claude Pro/Max, Gemini CLI, ChatGPT Plus/Pro, and Anthropic Console authentication flows.

Key Features:

  • PKCE OAuth2 Implementation (RFC 7636 compliant)
  • 4 OAuth Providers Supported
  • Token Storage with refresh mechanism
  • REST API endpoints for OAuth management

Phase 2: Model Registry Enhancement (COMPLETED) ✅

Component 1: Registry & API Endpoints

  • ModelRegistry singleton: Thread-safe concurrent access
  • Provider tracking: Reference counting, quota cooldown, client suspension
  • Rich metadata: Pricing, thinking support, capabilities, status tracking
  • 3 API Endpoints: /v1/models, /v1/models/{id}, /v1/models/available

Component 2: Real Provider Discovery

  • OpenAI: Calls /v1/models API
  • Gemini: Calls Google Generative API
  • Anthropic/Groq/Mistral: Static implementations
  • CachedDiscovery: 5-minute TTL cache
  • DiscoveryManager: Coordinates discovery across all providers

Component 3: Fallback Routing

  • SameProviderFallback: Prefer same provider fallback
  • CapabilityMatchFallback: Match capabilities
  • CostOptimizedFallback: Choose cheapest
  • Model mapping: Configuration-driven aliasing

Component 4: Pre-configured Models (15+)

  • Claude, Gemini, OpenAI, Groq, Mistral

Phase 3: Model Routing Integration (COMPLETED) ✅

Component 1: Routing System Integration

  • Enhanced routing module: Model availability checking
  • Fallback resolution: Automatic selection of available fallbacks
  • Request ID correlation: All logs include request IDs
  • Graceful degradation: Non-blocking with fallback

Key Functions:

  • is_model_available(): Check model availability
  • get_available_models(): Get all available models
  • resolve_model_with_fallback(): Get available model or fallback
  • get_fallback_models(): Get top 5 recommended fallbacks
  • log_routing_decision(): Log all routing decisions

Routing Logic:

  1. Check requested model availability in registry
  2. If unavailable, attempt fallback resolution
  3. Use SameProviderFallback strategy
  4. Log all routing decisions with request IDs
  5. Gracefully degrade if registry unavailable

Testing

226 total tests passing (+6 new tests in Phase 3)

Breakdown:

  • 45 brightstaff lib tests (+1)
  • 8 brightstaff main tests
  • 40 hermesllm tests (+2)
  • 114 common tests (+2)
  • 12 model_registry tests
  • 4 prompt_gateway tests
  • 2 doc tests

Files Changed

Phase 3 (Routing Integration)

New:

  • crates/brightstaff/src/handlers/model_routing.rs (130 lines)

Modified:

  • crates/common/src/routing.rs (+50 lines)
  • crates/brightstaff/src/handlers/mod.rs (+1 line)

Commits

  1. a28f35ac - Add OAuth Gateway microservice
  2. 83cec34f - Add Phase 2: Model Registry Enhancement with API endpoints
  3. f1fb4299 - Add real provider discovery APIs
  4. cb17632a - Add Phase 3: Model availability integration into routing system

Status

✅ All 226 tests passing
✅ Clean compilation
✅ Request ID tracing integrated
✅ Fallback routing implemented
✅ Model availability checking ready
✅ Foundation for health monitoring established

Next Steps (Phase 4+)

  • Wire routing into llm_gateway stream processing
  • Implement provider health monitoring
  • Add configuration system for policies
  • Create fallback strategy configuration

- Implement PKCE OAuth2 flow (RFC 7636 compliant)
- Support 4 OAuth providers: Claude, Gemini, ChatGPT, Anthropic Console
- Persistent token storage at ~/.archgw/oauth_tokens.json
- Multi-provider token management with refresh support
- REST API endpoints for OAuth operations
- Environment variables for all OAuth credentials
- Fix Gemini redirect_uri from /auth/gemini/callback to /auth/callback
- Docker integration via supervisord
- Comprehensive unit tests (211 tests passing)
Implement dynamic model availability tracking and management across 15+ providers.
Introduces three new HTTP endpoints and a thread-safe registry for managing model
metadata, fallback routing, and provider distribution tracking.

New Features:
- New crate: model_registry with ModelRegistry singleton for concurrent access
- ModelInfo struct with rich metadata (pricing, thinking support, capabilities)
- Three fallback strategies: SameProviderFallback, CapabilityMatchFallback, CostOptimizedFallback
- Model mapping/aliasing support for request transformation
- 15+ pre-configured models (Claude, Gemini, OpenAI, Groq, Mistral)

API Endpoints:
- GET /v1/models - List all available models with rich metadata
- GET /v1/models/{model_id} - Get individual model details
- GET /v1/models/available - List only active/beta models

Integration:
- brightstaff initialized with default models on startup
- Enhanced models handler to use registry instead of config-based list
- OpenAI-compatible response format for all endpoints

Testing:
- 8 new unit tests for registry core functionality
- All 215 existing tests still passing
- Clean compilation with no errors
Implement dynamic model discovery from LLM providers with async/await patterns.
Adds OpenAI, Anthropic, Gemini, Groq, and Mistral discovery implementations.
Supports caching with configurable TTL and graceful error handling with timeouts.

New Features:
- ModelDiscovery async trait for provider-agnostic discovery
- OpenAI implementation: Calls /v1/models API endpoint
- Gemini implementation: Calls Google Generative API with model discovery
- Anthropic/Groq/Mistral: Static implementations with known models
- CachedDiscovery wrapper: 5-minute TTL cache for provider API calls
- DiscoveryManager: Coordinates discovery across all providers

API Integrations:
- OpenAI: Fetches real-time model list (requires OPENAI_API_KEY)
- Gemini: Fetches real-time model list (requires GEMINI_API_KEY)
- Anthropic/Groq/Mistral: Pre-configured known models (no API key needed)

New Handler:
- discover_and_register_models(): Called on startup to auto-populate registry
- Gracefully handles missing API keys and provider timeouts
- Logs discovery results and failures with tracing

Testing:
- 4 new discovery tests (cached, anthropic, groq, discovery manager)
- 12 total model_registry tests (was 8)
- 220 total workspace tests (was 215)
- All tests passing with no regressions

Error Handling:
- DiscoveryTimeout error now includes provider name
- 10-second timeout per provider API call
- Graceful fallback to static definitions on discovery failure
Implement model availability checking and fallback routing in the request path.
Adds routing helpers for checking model availability and selecting fallbacks when
primary models are unavailable. Integrates with ModelRegistry for real-time
availability tracking.

New Components:
- Model routing helpers module: model_routing.rs
- Model availability checking functions
- Fallback model resolution with logging
- Recommended fallback models lookup
- Routing decision logging with fallback tracking

Integration Points:
- Common routing module: Enhanced get_llm_provider() with model availability checking
- Brightstaff handlers: New model_routing module with public API
- Model registry integration: Uses registry for availability checks
- Tracing/logging: Logs all routing decisions and fallbacks

Key Functions:
- is_model_available(): Check if model is in registry and available
- get_available_models(): Get list of all available models
- resolve_model_with_fallback(): Get available model or fallback alternative
- get_fallback_models(): Get top 5 recommended fallback models
- log_routing_decision(): Log routing decisions to traces

Features:
- Automatic fallback selection when primary model unavailable
- Same provider preference for fallbacks (default strategy)
- Graceful error handling with logging
- Request ID correlation in all logs
- Non-blocking: Falls back to random selection if registry unavailable

Testing:
- 4 new model_routing tests
- 2 new hermesllm tests
- 6 total new tests
- 226 total workspace tests (was 220)
- All tests passing with no regressions

Ready for:
- Streaming requests with model availability checks
- Real-time failover when models become unavailable
- Provider health monitoring (Phase 3+)
- Configuration-based policies (Phase 3+)
- Integrate resolve_model_with_fallback() into router_chat_get_upstream_model()
- Check availability of both routed and default models
- Apply fallback routing if primary model unavailable
- Log routing decisions with request ID correlation
- Gracefully handle cases where no fallback is available
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant