Skip to content

Conversation

@striver-24
Copy link

@striver-24 striver-24 commented Sep 29, 2025

Closes #89

  • Added a 30-second rate limit per user and better checks for API keys and user input (blocking bad/harmful requests).
  • Implemented retry logic and exponential backoff to handle Discord's errors, plus a 5-minute timeout to prevent the bot from hanging.
  • Users get clearer error messages, and now have detailed logs for troubleshooting.

- Add comprehensive rate limiting for users (30-second cooldown)
- Implement robust Discord API error handling with retry logic
- Enhance API key validation with length and character checks
- Add input validation for user requests (length, content, harmful keywords)
- Improve user-facing error messages with emojis and clear descriptions
- Add timeout handling for agent processing (5-minute limit)
- Implement graceful error recovery and rollback mechanisms
- Add detailed structured logging for debugging and monitoring
- Handle Discord connection/disconnection events with user feedback
- Add exponential backoff for server errors and rate limit compliance

This significantly improves the bot's production readiness and user experience.
@striver-24 striver-24 requested a review from a team as a code owner September 29, 2025 20:18
@striver-24 striver-24 changed the title Enhance: Improve error handling and robustness in Discord agent (Issue No. #89) Enhance: Improve error handling in Discord agent Oct 2, 2025
@rishikesh-jentic
Copy link
Collaborator

Thanks so much for this PR and for taking the time to contribute to Standard Agent — welcome aboard 🙌
We’ll review and share feedback soon! 🚀

@striver-24
Copy link
Author

Thank you for the warm welcome! I’m thrilled to contribute to Standard Agent and eagerly await your feedback. 😊

if attempt < max_retries - 1:
await asyncio.sleep(retry_after)
continue
elif exc.status == 403: # Forbidden
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elif exc.status == 403 and elif exc.status >= 500 are paired with the wrong if statement - they're currently at the same level as if attempt < max_retries - 1, should they be at the same level as if exc.status == 429 (one indent back) ?

if attempt == max_retries - 1:
return False
await asyncio.sleep(1)
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return False at the end is inside the for loop - this means it will exit after the first iteration that doesn't explicitly return or continue, instead of trying the remaining retries. Is that intentional ?

logger = get_logger(__name__)

#Rate Limit Tracking
user_last_request = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the addition of rate limiting! 🎯

One thing to consider: the user_last_request dictionary will grow indefinitely as new users interact with the bot. In a long-running bot, this could accumulate thousands of entries over time and cause memory issues.

You might want to add some cleanup logic - would a Deque be worth considering here ?

Just something to think about for production environments where the bot might run for weeks/months!

return

# Check for potentially harmful content
if any(word in goal.lower() for word in ['hack', 'exploit', 'malware', 'virus']):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the thought behind adding content filtering! However, simple keyword-based filtering tends to have some challenges:

  • False positives: Legitimate questions like "How do I protect against hackers?" would be blocked
  • Easy to bypass: Simple character substitutions (h4ck, expl0it) make it ineffective
  • Blocks valid use cases: Security research, educational questions, etc.

I'd suggest either removing this check or considering a more robust solution.

I would like to know your inputs here.

if reasoning_strategy not in valid:
await interaction.response.send_message("Usage: /standard_agent reasoner reasoning_strategy", ephemeral=True)

if len(key) < 10:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question about the API key validation checks (length, character restrictions, etc.) - I'm wondering if these might cause issues down the line?

These checks assume specific format requirements that we can't really guarantee:

  • API key formats can change over time
  • Different environments might use different formats
  • Valid keys might be rejected, or invalid ones might pass these checks

The only way to truly validate an API key is to try using it. I'd suggest either removing these format checks or replacing them with an actual API test call during configuration.

What do you think? Is there a specific key format you're trying to enforce here?

)
logger.info("reasoner_changed_no_reload", old=old_profile, new=runtime.chosen_profile)

except discord.InteractionResponded:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice there's a recurring pattern for handling discord.InteractionResponded exceptions throughout the code (in configure, kill, reasoner, etc.). A couple of thoughts:

  1. InteractionResponded usually indicates a logic bug - if this exception is being raised, it means we tried to respond to an interaction twice, which shouldn't happen in normal flow. Rather than catching it everywhere, it might be worth investigating why it could occur.

  2. The pattern is repeated ~10 times - this makes the code harder to maintain. If we need to change the error handling logic, we'd have to update it in many places.

Consider either:

  • Simplifying to just handle the expected exceptions (the InteractionResponded catches might be masking bugs)
  • Or creating a decorator/wrapper if this error handling is truly necessary

This would make the code cleaner and easier to reason about. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Good First Issue] [Enhancement] Improve Discord bot error handling

2 participants