[P0] LLM Retry Logic Infinite Loop Risk

## Summary

The rate limit retry logic in `LLMEnricher.enrich_skill()` recursively calls itself without any retry limit counter, creating potential infinite loop. If Anthropic API returns rate limit errors repeatedly (e.g., account suspended, quota exhausted), this will retry infinitely causing stack overflow or hang.

## Impact

- API key revoked → retry forever → stack overflow or hang
- User cannot interrupt (no max retry parameter)
- Each retry consumes stack space (recursive calls)
- Real scenario: API key revoked → retry forever → production system hangs
- Unnecessary API costs if quota not completely exhausted

## Location

- **File**: `src/agentready/learners/llm_enricher.py`
- **Lines**: 93-99
- **Function**: `LLMEnricher.enrich_skill()`

## Current Code

```python
except RateLimitError as e:
    logger.warning(f"Rate limit hit for {skill.skill_id}: {e}")
    # Exponential backoff
    retry_after = int(getattr(e, "retry_after", 60))
    logger.info(f"Retrying after {retry_after} seconds...")
    sleep(retry_after)
    return self.enrich_skill(skill, repository, finding, use_cache)
```

## Solution

Add bounded retry with graceful fallback:

```python
def enrich_skill(
    self,
    skill: DiscoveredSkill,
    repository: Repository,
    finding: Finding,
    use_cache: bool = True,
    max_retries: int = 3,
    _retry_count: int = 0,
) -> DiscoveredSkill:
    """Enrich skill with LLM-generated content.

    Args:
        skill: Skill to enrich
        repository: Repository context
        finding: Assessment finding
        use_cache: Use cached responses if available (default: True)
        max_retries: Maximum retry attempts for rate limits (default: 3)
        _retry_count: Internal retry counter (do not set manually)

    Returns:
        Enriched skill with LLM content, or original skill if enrichment fails
    """
    # ... existing code ...

    except RateLimitError as e:
        # Check if max retries exceeded
        if _retry_count >= max_retries:
            logger.error(
                f"Max retries ({max_retries}) exceeded for {skill.skill_id}. "
                f"Falling back to heuristic skill. "
                f"Check API quota: https://console.anthropic.com/settings/limits"
            )
            return skill  # Graceful fallback

        # Calculate backoff with jitter
        retry_after = int(getattr(e, "retry_after", 60))
        jitter = random.uniform(0, min(retry_after * 0.1, 5))
        total_wait = retry_after + jitter

        logger.warning(
            f"Rate limit hit for {skill.skill_id} "
            f"(retry {_retry_count + 1}/{max_retries}): {e}"
        )
        logger.info(f"Retrying after {total_wait:.1f} seconds...")

        sleep(total_wait)

        return self.enrich_skill(
            skill, repository, finding, use_cache, max_retries, _retry_count + 1
        )
```

## Testing

```bash
# 1. Unit tests for retry behavior
pytest tests/unit/test_llm_enricher.py::test_llm_enricher_max_retries -v
pytest tests/unit/test_llm_enricher.py::test_llm_enricher_successful_retry -v

# 2. Manual test with invalid API key (should fail gracefully)
export ANTHROPIC_API_KEY="invalid-key"
agentready learn . --enable-llm --llm-max-retries 2

# Expected: Retries 2 times, then falls back to heuristic
```

## Acceptance Criteria

- [ ] max_retries parameter added to function signature
- [ ] Retry counter checked before recursive call
- [ ] Graceful fallback to heuristic skill on max retries
- [ ] Jitter added to prevent thundering herd
- [ ] CLI option `--llm-max-retries` added
- [ ] Unit tests for retry limit added
- [ ] Unit tests for successful retry added
- [ ] Documentation updated with retry behavior
- [ ] Error messages include helpful context (API quota link)
- [ ] All existing tests pass

## Best Practices Applied

1. **Exponential backoff with jitter**: Prevents thundering herd
2. **Bounded retries**: Prevents infinite loops
3. **Graceful degradation**: Falls back to heuristic on failure
4. **User control**: CLI option for retry limit
5. **Helpful errors**: Links to API quota page

## References

- Full remediation plan: `.plans/code-review-remediation-plan.md`

**Labels**: security, bug, P0, llm
**Milestone**: v1.24.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P0] LLM Retry Logic Infinite Loop Risk #104

Summary

Impact

Location

Current Code

Solution

Testing

Acceptance Criteria

Best Practices Applied

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[P0] LLM Retry Logic Infinite Loop Risk #104

Description

Summary

Impact

Location

Current Code

Solution

Testing

Acceptance Criteria

Best Practices Applied

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions