-
Notifications
You must be signed in to change notification settings - Fork 29
Closed
Labels
Description
Summary
The rate limit retry logic in LLMEnricher.enrich_skill() recursively calls itself without any retry limit counter, creating potential infinite loop. If Anthropic API returns rate limit errors repeatedly (e.g., account suspended, quota exhausted), this will retry infinitely causing stack overflow or hang.
Impact
- API key revoked → retry forever → stack overflow or hang
- User cannot interrupt (no max retry parameter)
- Each retry consumes stack space (recursive calls)
- Real scenario: API key revoked → retry forever → production system hangs
- Unnecessary API costs if quota not completely exhausted
Location
- File:
src/agentready/learners/llm_enricher.py - Lines: 93-99
- Function:
LLMEnricher.enrich_skill()
Current Code
except RateLimitError as e:
logger.warning(f"Rate limit hit for {skill.skill_id}: {e}")
# Exponential backoff
retry_after = int(getattr(e, "retry_after", 60))
logger.info(f"Retrying after {retry_after} seconds...")
sleep(retry_after)
return self.enrich_skill(skill, repository, finding, use_cache)Solution
Add bounded retry with graceful fallback:
def enrich_skill(
self,
skill: DiscoveredSkill,
repository: Repository,
finding: Finding,
use_cache: bool = True,
max_retries: int = 3,
_retry_count: int = 0,
) -> DiscoveredSkill:
"""Enrich skill with LLM-generated content.
Args:
skill: Skill to enrich
repository: Repository context
finding: Assessment finding
use_cache: Use cached responses if available (default: True)
max_retries: Maximum retry attempts for rate limits (default: 3)
_retry_count: Internal retry counter (do not set manually)
Returns:
Enriched skill with LLM content, or original skill if enrichment fails
"""
# ... existing code ...
except RateLimitError as e:
# Check if max retries exceeded
if _retry_count >= max_retries:
logger.error(
f"Max retries ({max_retries}) exceeded for {skill.skill_id}. "
f"Falling back to heuristic skill. "
f"Check API quota: https://console.anthropic.com/settings/limits"
)
return skill # Graceful fallback
# Calculate backoff with jitter
retry_after = int(getattr(e, "retry_after", 60))
jitter = random.uniform(0, min(retry_after * 0.1, 5))
total_wait = retry_after + jitter
logger.warning(
f"Rate limit hit for {skill.skill_id} "
f"(retry {_retry_count + 1}/{max_retries}): {e}"
)
logger.info(f"Retrying after {total_wait:.1f} seconds...")
sleep(total_wait)
return self.enrich_skill(
skill, repository, finding, use_cache, max_retries, _retry_count + 1
)Testing
# 1. Unit tests for retry behavior
pytest tests/unit/test_llm_enricher.py::test_llm_enricher_max_retries -v
pytest tests/unit/test_llm_enricher.py::test_llm_enricher_successful_retry -v
# 2. Manual test with invalid API key (should fail gracefully)
export ANTHROPIC_API_KEY="invalid-key"
agentready learn . --enable-llm --llm-max-retries 2
# Expected: Retries 2 times, then falls back to heuristicAcceptance Criteria
- max_retries parameter added to function signature
- Retry counter checked before recursive call
- Graceful fallback to heuristic skill on max retries
- Jitter added to prevent thundering herd
- CLI option
--llm-max-retriesadded - Unit tests for retry limit added
- Unit tests for successful retry added
- Documentation updated with retry behavior
- Error messages include helpful context (API quota link)
- All existing tests pass
Best Practices Applied
- Exponential backoff with jitter: Prevents thundering herd
- Bounded retries: Prevents infinite loops
- Graceful degradation: Falls back to heuristic on failure
- User control: CLI option for retry limit
- Helpful errors: Links to API quota page
References
- Full remediation plan:
.plans/code-review-remediation-plan.md
Labels: security, bug, P0, llm
Milestone: v1.24.0
Reactions are currently unavailable