Fix Azure JWKS cache + Task Scheduler orphaned processes#145
Merged
rhythmatician merged 8 commits intomainfrom Oct 27, 2025
Merged
Fix Azure JWKS cache + Task Scheduler orphaned processes#145rhythmatician merged 8 commits intomainfrom
rhythmatician merged 8 commits intomainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR enhances authentication reliability by implementing TTL-based JWKS caching to prevent service interruptions during Azure AD key rotations (Issue #143). The changes replace the simple LRU cache with a time-based cache that includes automatic refresh capabilities when token key IDs don't match cached keys.
Key changes:
- Implemented TTL-based JWKS caching with 1-hour expiry and forced refresh capability
- Added automatic JWKS refresh on key ID mismatch to handle Azure AD key rotations
- Updated test fixtures to work with new cache structure
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
utils/auth.py |
Replaced LRU cache with TTL-based JWKS caching including fallback logic and automatic refresh on key mismatch |
utils/get_token.py |
Simplified access token print statement for easier scripting |
run_api.bat |
Updated virtual environment path from venv to .venv |
conftest.py |
Updated test fixture to reset TTL cache variables instead of clearing LRU cache |
…heduler compatibility
… inside lock scope
rhythmatician
added a commit
that referenced
this pull request
Nov 14, 2025
* fix: update virtual environment activation path and simplify access token print statement * fix: simplify access token print statement * feat: add MCP related files to .gitignore * fix: update JWKS caching mechanism to use TTL and allow forced refresh * fix: update Flask startup method to run in the foreground for Task Scheduler compatibility * refactor: remove unused Azure settings dataclass and clean up imports * fix: streamline JWKS cache management by using public API for clearing cache * fix: resolve NameError in JWKS cache fallback - recalculate cache_age inside lock scope
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 Overview
This PR completely fixes Issue #143 (stale Azure AD JWKS cache causing authentication failures) and resolves a critical Task Scheduler orphaned process issue discovered during testing.
🔧 Problem #1: JWKS Cache Never Refreshed (Issue #143)
Symptom: Authentication failures after Azure AD key rotation required server restart.
Root Cause:
@lru_cachedecorator kept JWKS in memory for the lifetime of the process (potentially weeks/months).Solution: Implemented TTL-based cache with automatic retry:
kidnot found (detects key rotation)Test Results:
pytest utils/tests/test_auth_jwks_cache.py -v # 10/10 tests passing ✅🚀 Problem #2: Task Scheduler Orphaned Processes (Discovered During Testing)
Symptom: Stopping Task Scheduler task left Flask running as SYSTEM process on port 5000.
Impact:
Root Cause:
start "Flask"inrun_api.batcreated detached console process that Task Scheduler couldn't manage.Solution: Changed to
start /B(background in same console):Additional Fix: Updated Task Scheduler configuration:
StopOnIdleEnd: false← CRITICAL (wastrue, caused unexpected shutdowns!)DisallowStartIfOnBatteries: false(prevents UPS/laptop issues)RestartOnFailurepolicy (3 retries, 1-minute interval)Validation Results:
📁 Files Changed
Core JWKS Cache Fix
utils/auth.py- Complete cache rewrite (TTL-based, auto-retry)conftest.py- Updated cache reset fixture for TTL globalsutils/tests/test_auth_jwks_cache.py- 10 comprehensive tests (NEW)Task Scheduler Fixes
run_api.bat- Fixed process management (start /B)TASK-SCHEDULER_core-api_UPDATED.xml- Improved configuration (NEW)Testing & Documentation
test_server_lifecycle.py- Automated lifecycle validation (NEW).github/ISSUE_143_SOLUTION.md- Implementation guide (NEW).github/TASK_SCHEDULER_FINDINGS.md- Investigation docs (NEW).github/PR_145_SUMMARY.md- Complete PR documentation (NEW)🧪 Testing
JWKS Cache Tests
Server Lifecycle Validation
python test_server_lifecycle.py # All tests passed ✅ Validation: ✅ Port 5000 availability check ✅ Cache initialization ✅ Server health checks (/, /docs) ✅ Graceful shutdown ✅ Port properly released ✅ No orphaned processes ✅ Rapid restart (3 cycles)📋 Deployment Checklist
Pre-Deployment
Kill orphaned processes (if any exist):
Update Task Scheduler:
TASK-SCHEDULER_core-api_UPDATED.xmlas new taskStopOnIdleEnd=falseis critical!)Verify batch script:
run_api.batusesstart /B(notstart "Flask")Post-Deployment Validation
http://localhost:5000/docsGet-Process python)netstat -ano | Select-String ":5000")🎯 Impact
📊 Expected Behavior After Deployment
JWKS Cache:
kidnot found (key rotation)Server Lifecycle:
Ready for review and merge! 🚀
Fixes #143