-
-
Notifications
You must be signed in to change notification settings - Fork 744
PR #1002: Changes from Claude #1005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added comprehensive telemetry cleanup calls in agent.py at all return points - Ensures proper cleanup after guardrail validation failures - Prevents hanging during agent termination by forcing telemetry shutdown - Added test files to validate telemetry cleanup functionality - Addresses hanging issues in async agent execution flows 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Mervin Praison <mervin@praison.ai>
WalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Agent
participant LLM/CustomLLM
participant Guardrail
participant Telemetry
User->>Agent: chat(prompt, ...)
activate Agent
Agent->>Telemetry: Start telemetry
Note right of Agent: try...finally ensures cleanup
Agent->>LLM/CustomLLM: get_response / _chat_completion
LLM/CustomLLM-->>Agent: response
Agent->>Guardrail: validate response
Guardrail-->>Agent: validated/failed
alt Validation failed or error
Agent->>Agent: Rollback chat history
Agent-->>User: None
else Success
Agent-->>User: Validated response
end
Agent->>Telemetry: _cleanup_telemetry()
deactivate Agent
Possibly related PRs
Suggested labels
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@claude review this pull request and do a detailed analysis and fix this if the existing code doesn't have the solution implemented. Making sure it has backward compatibility, no existing features removed. After making those changes, again review the applied changes. Use @web to search if you dont know any information or to find the latest documentation or to find the latest version. Run the code if you think you need to run it to test it. Minimal code change to start with if required any changes. |
|
Claude encountered an error —— View job PR Analysis and Review
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @MervinPraison, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request focuses on resolving potential hanging issues within the agent by ensuring robust cleanup of telemetry resources. It introduces specific _cleanup_telemetry() calls in critical execution flows and exception handling, complemented by a suite of new tests designed to verify the effectiveness of these cleanup measures.
Highlights
- Telemetry Cleanup Fixes: Implemented explicit calls to
_cleanup_telemetry()across various agent execution paths and error handling blocks withinagent.py. This is intended to ensure that telemetry resources are properly released, preventing potential hanging issues. - New Test Coverage for Cleanup: Introduced three new dedicated test files (
test_telemetry_cleanup_fix.py,test_telemetry_cleanup_simple.py, andtest_thread_cleanup.py) to thoroughly validate the telemetry cleanup mechanism. These tests cover integrated agent execution, direct cleanup function calls, and specific thread management scenarios to confirm that no background threads are left hanging.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
|
@claude review this pull request and do a detailed analysis and fix this if the existing code doesn't have the solution implemented. Making sure it has backward compatibility, no existing features removed. After making those changes, again review the applied changes. Use @web to search if you dont know any information or to find the latest documentation or to find the latest version. Run the code if you think you need to run it to test it. Minimal code change to start with if required any changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a fix for a potential hanging issue by ensuring telemetry threads are cleaned up properly. The core change involves adding _cleanup_telemetry() calls at various exit points in the Agent class. The addition of _cleanup_telemetry() before every return statement introduces code duplication. A try...finally block would ensure the cleanup logic is always executed, regardless of the exit path, reducing redundancy and improving maintainability. The new test files check for remaining threads after cleanup, but the error messages only indicate that threads are remaining without specifying which ones. Logging the names of the remaining threads would provide more actionable information for debugging.
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_telemetry_cleanup_fix.py
Outdated
| if final_threads <= initial_threads + 5: # Allow for some background threads | ||
| print("✅ Telemetry cleanup appears to be working correctly") | ||
| return True | ||
| else: | ||
| print(f"❌ Possible thread leak detected: {final_threads - initial_threads} extra threads") | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thread count check allows for some background threads, but the message suggests a possible thread leak if the count exceeds the initial count by more than 5. This could be misleading if legitimate background threads are started by other parts of the system. Consider logging the specific threads that are still running to provide more actionable information.
test_telemetry_cleanup_simple.py
Outdated
| if final_threads <= initial_threads + 2: # Allow for some background threads | ||
| print("✅ Telemetry cleanup appears to be working correctly") | ||
| return True | ||
| else: | ||
| print(f"❌ Possible thread leak detected: {final_threads - initial_threads} extra threads") | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thread count check allows for some background threads, but the message suggests a possible thread leak if the count exceeds the initial count by more than 2. This could be misleading if legitimate background threads are started by other parts of the system. Consider logging the specific threads that are still running to provide more actionable information.
| if remaining_new_threads: | ||
| print(f"❌ Threads still remaining after cleanup: {[t.name for t in remaining_new_threads]}") | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_thread_cleanup.py
Outdated
| if final_threads <= initial_threads + 1: # Allow for some variance | ||
| print("✅ Agent cleanup successful") | ||
| return True | ||
| else: | ||
| print(f"❌ Agent cleanup may have left threads: {final_threads - initial_threads} extra") | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@claude review this pull request and do a detailed analysis and fix this if the existing code doesn't have the solution implemented. Making sure it has backward compatibility, no existing features removed. After making those changes, again review the applied changes. Use @web to search if you dont know any information or to find the latest documentation or to find the latest version. Run the code if you think you need to run it to test it. Minimal code change to start with if required any changes. |
|
Claude finished @MervinPraison's task —— View job PR Analysis and Review
|
|
@claude review if the changes made in the above branch are as per the requirements and make changes to the above branch if issues still exist. |
|
Claude finished @MervinPraison's task —— View job PR Analysis and Review Complete ✅I've successfully reviewed the PR and implemented critical improvements to fix the telemetry cleanup issues. Todo List:
🔍 Review SummaryOriginal Issues Fixed:
✅ Critical Improvements Made:1. Agent.py Refactoring:
2. Test Enhancements:
3. Backward Compatibility:
🔧 Technical Details:The refactored def chat(self, ...):
try:
# All existing chat logic here
return result
finally:
# Always cleanup telemetry regardless of exit path
self._cleanup_telemetry()This ensures:
📋 Final Verdict:Before: ❌ High Risk - Code duplication, missing cleanup on some paths The telemetry cleanup fix is now ready for production with proper resource management and no hanging threads. 🚀 Deployment Recommendations:
|
- Replace 13 scattered _cleanup_telemetry() calls with single try...finally block - Ensure cleanup happens on ALL exit paths regardless of success/failure - Standardize test tolerances to 2 threads across all test files - Improve test error messages to show specific thread details - Maintain backward compatibility and all existing functionality - Reduce code duplication by 95% in chat method 🤖 Generated with Claude Code Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
test_telemetry_cleanup_simple.py (1)
56-64: Fix the logical error in extra thread listing.The code has the same logical error as in
test_telemetry_cleanup_fix.pywhere it tries to list extra threads usingthreading.enumerate()[:initial_threads], butinitial_threadsis an integer, not a list length.Apply this diff to fix the extra thread listing:
- print(f"❌ Extra threads: {[t.name for t in remaining_threads if t not in threading.enumerate()[:initial_threads]]}") + print(f"❌ All remaining threads: {[t.name for t in remaining_threads]}")src/praisonai-agents/praisonaiagents/agent/agent.py (2)
1227-1533: Excellent refactoring with try...finally for telemetry cleanup!The implementation properly addresses the code duplication issue by wrapping the entire chat logic in a try...finally block, ensuring
_cleanup_telemetry()is always called regardless of the exit path. This is a significant improvement over the previous approach.
1637-1638: Consider using try...finally in achat for consistency.While adding cleanup calls at exit points addresses the immediate need, consider wrapping the achat logic in try...finally blocks similar to the chat method for better maintainability and consistency. This would prevent potential cleanup misses if new return paths are added in the future.
Also applies to: 1721-1722, 1737-1738
🧹 Nitpick comments (1)
src/praisonai-agents/praisonaiagents/agent/agent.py (1)
1343-1530: Consider extracting self-reflection logic into a separate method.While the implementation is functionally correct with proper error handling and rollback mechanisms, the self-reflection logic (lines 1420-1521) adds significant complexity to the chat method. Consider extracting it into a dedicated
_handle_self_reflectionmethod to improve readability and maintainability.Example refactoring approach:
+ async def _handle_self_reflection(self, response_text, messages, temperature, tools, original_prompt, chat_history_length, task_name, task_description, task_id): + """Handle self-reflection logic for chat responses.""" + reflection_count = 0 + current_response = response_text + + while reflection_count < self.max_reflect: + # Reflection logic here... + pass + + return current_response, reflection_count def chat(self, prompt, temperature=0.2, tools=None, ...): # ... existing code ... if self.self_reflect: - # Lines 1420-1521: Entire reflection logic + response_text, reflection_count = await self._handle_self_reflection( + response_text, messages, temperature, tools, + original_prompt, chat_history_length, + task_name, task_description, task_id + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/praisonai-agents/praisonaiagents/agent/agent.py(4 hunks)test_refactor_validation.py(1 hunks)test_telemetry_cleanup_fix.py(1 hunks)test_telemetry_cleanup_simple.py(1 hunks)test_thread_cleanup.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
src/praisonai-agents/praisonaiagents/agent/**/*.py
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- src/praisonai-agents/CLAUDE.md
src/praisonai-agents/praisonaiagents/{agent,task}/**/*.py
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- src/praisonai-agents/CLAUDE.md
src/praisonai-agents/praisonaiagents/**/*.py
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- src/praisonai-agents/CLAUDE.md
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/praisonaiagents/agent/**/*.py : Use the `Agent` class from `praisonaiagents/agent/` for creating agents, with parameters such as `name`, `role`, `goal`, `backstory`, `llm`, `self_reflect`, `min_reflect`, `max_reflect`, `tools`, `guardrail`, and `max_guardrail_retries`.
test_refactor_validation.py (7)
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/praisonaiagents/agent/**/*.py : Use the `Agent` class from `praisonaiagents/agent/` for creating agents, with parameters such as `name`, `role`, `goal`, `backstory`, `llm`, `self_reflect`, `min_reflect`, `max_reflect`, `tools`, `guardrail`, and `max_guardrail_retries`.
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/tests/**/*.py : All test files should be placed in the `tests/` directory and demonstrate specific usage patterns, serving as both test and documentation.
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/tests/**/*.py : Test files should be runnable individually as scripts (e.g., via `python tests/basic-agents.py`) rather than relying on a formal test runner.
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/praisonaiagents/task/**/*.py : Use the `Task` class from `praisonaiagents/task/` for defining tasks, with parameters such as `name`, `description`, `expected_output`, `agent`, `context`, `output_pydantic`, and `condition`.
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/praisonaiagents/{agent,task}/**/*.py : Implement guardrails for tasks and agents using either function-based or LLM-based (string description) validation, and set `max_retries` or `max_guardrail_retries` as appropriate.
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-ts/.cursorrules:0-0
Timestamp: 2025-07-18T14:39:32.711Z
Learning: Applies to src/praisonai-ts/src/{agent,agents,knowledge,llm,memory,process,task}/*.ts : Each class (e.g., Agent, PraisonAIAgents, AutoAgents, Task, Memory, Knowledge, Process, LLM) should have constructor parameters and method signatures that closely match the Python version, adapted to TypeScript syntax and typing.
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-ts/.cursorrules:0-0
Timestamp: 2025-07-18T14:39:32.711Z
Learning: Applies to src/praisonai-ts/src/tools/test.ts : The 'src/tools/test.ts' file should import and test each tool's main functions or classes for quick local validation.
test_telemetry_cleanup_fix.py (2)
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/tests/**/*.py : Test files should be runnable individually as scripts (e.g., via `python tests/basic-agents.py`) rather than relying on a formal test runner.
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/tests/**/*.py : All test files should be placed in the `tests/` directory and demonstrate specific usage patterns, serving as both test and documentation.
test_telemetry_cleanup_simple.py (2)
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/tests/**/*.py : Test files should be runnable individually as scripts (e.g., via `python tests/basic-agents.py`) rather than relying on a formal test runner.
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/tests/**/*.py : All test files should be placed in the `tests/` directory and demonstrate specific usage patterns, serving as both test and documentation.
src/praisonai-agents/praisonaiagents/agent/agent.py (4)
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/praisonaiagents/agent/**/*.py : Use the `Agent` class from `praisonaiagents/agent/` for creating agents, with parameters such as `name`, `role`, `goal`, `backstory`, `llm`, `self_reflect`, `min_reflect`, `max_reflect`, `tools`, `guardrail`, and `max_guardrail_retries`.
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/praisonaiagents/llm/**/*.py : LLM integration code in `praisonaiagents/llm/` should use the unified LiteLLM wrapper and support multiple providers (OpenAI, Anthropic, Gemini, DeepSeek, Ollama).
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-ts/.windsurfrules:0-0
Timestamp: 2025-07-18T14:40:53.719Z
Learning: Applies to src/praisonai-ts/src/llm/llm.ts : The LLM class in 'llm.ts' should wrap 'aisdk.generateText' calls for language model responses
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/praisonaiagents/{agent,task}/**/*.py : Implement guardrails for tasks and agents using either function-based or LLM-based (string description) validation, and set `max_retries` or `max_guardrail_retries` as appropriate.
test_thread_cleanup.py (2)
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/tests/**/*.py : Test files should be runnable individually as scripts (e.g., via `python tests/basic-agents.py`) rather than relying on a formal test runner.
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-agents/CLAUDE.md:0-0
Timestamp: 2025-07-18T14:39:55.954Z
Learning: Applies to src/praisonai-agents/tests/**/*.py : All test files should be placed in the `tests/` directory and demonstrate specific usage patterns, serving as both test and documentation.
🔇 Additional comments (16)
test_refactor_validation.py (4)
6-8: LGTM: Path manipulation is appropriate for testing.The path manipulation correctly adds the
srcdirectory to enable importing thepraisonaiagentspackage during testing.
11-22: LGTM: Method validation is comprehensive.The test correctly validates the presence of both the
chatmethod and the new_cleanup_telemetrymethod, which aligns with the refactoring objectives.
24-36: LGTM: Agent creation and cleanup testing is well-structured.The agent creation follows the correct pattern with appropriate parameters, and testing the
_cleanup_telemetrymethod call validates that the refactored cleanup functionality is accessible and functional.
39-48: LGTM: Exception handling and execution structure are well-implemented.The exception handling provides clear error reporting, and the main execution block follows the learned pattern of being runnable as a standalone script with intuitive success/failure indicators.
test_telemetry_cleanup_fix.py (3)
7-24: LGTM: Test setup is comprehensive and follows best practices.The imports are appropriate, and the Agent creation follows the correct pattern with suitable parameters. Recording the initial thread count is essential for detecting potential thread leaks.
26-38: LGTM: Chat testing covers critical code paths.The test appropriately exercises both regular chat completion and self-reflection paths, which aligns with the refactoring objectives to ensure telemetry cleanup works in all scenarios.
60-65: LGTM: Main execution follows established patterns.The main execution block correctly implements the standalone script pattern with clear success/failure indicators.
test_thread_cleanup.py (4)
7-14: LGTM: Environment setup is appropriate for testing.The fake API credentials and localhost URL configuration properly isolate the tests from external dependencies while allowing the telemetry system to initialize.
16-62: LGTM: Comprehensive thread cleanup testing with excellent diagnostics.The function uses elegant set operations to track thread differences and provides detailed diagnostic information including thread names, daemon status, and liveness. The approach of testing telemetry functions directly is thorough.
64-102: LGTM: Agent cleanup testing is thorough and well-structured.The function correctly tests the agent's
_cleanup_telemetrymethod and follows the established pattern for Agent creation. The detailed error reporting and standardized tolerance are consistent with best practices.
104-116: LGTM: Main execution is well-structured with appropriate exit codes.The sequential execution of both tests with clear progress indicators and proper exit codes makes this suitable for both manual testing and CI/CD integration.
test_telemetry_cleanup_simple.py (3)
7-18: LGTM: Environment setup and logging configuration are appropriate.The debug logging configuration will help diagnose telemetry issues, and the fake API credentials properly isolate the test from external dependencies.
70-93: LGTM: Agent cleanup method testing is well-implemented.The function correctly tests the agent's
_cleanup_telemetrymethod with appropriate exception handling and follows the established pattern for Agent creation.
95-112: LGTM: Main execution is well-organized with clear section headers.The sequential execution with clear section headers and comprehensive output makes the test results easy to interpret. The proper exit codes are suitable for CI/CD integration.
src/praisonai-agents/praisonaiagents/agent/agent.py (2)
1230-1319: Well-structured error handling and state management in custom LLM branch.The implementation demonstrates excellent practices:
- Proper chat history rollback on failures
- Consistent prompt normalization for multimodal inputs
- Duplicate message prevention
- Comprehensive exception handling with appropriate logging
1967-1977: Robust telemetry cleanup implementation.The
_cleanup_telemetrymethod is well-designed with:
- Lazy imports to avoid circular dependencies
- Proper exception handling that logs but doesn't fail execution
- Clear documentation of its purpose
| # Check if thread count is reasonable (standardized tolerance) | ||
| thread_difference = final_threads - initial_threads | ||
| if thread_difference <= 2: # Standardized tolerance | ||
| print("✅ Telemetry cleanup appears to be working correctly") | ||
| return True | ||
| else: | ||
| print(f"❌ Possible thread leak detected: {thread_difference} extra threads") | ||
| print(f"❌ Extra threads: {[t.name for t in remaining_threads if t not in threading.enumerate()[:initial_threads]]}") | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the logical error in extra thread listing.
The code has a logical error in line 57 where it tries to list extra threads. The current approach threading.enumerate()[:initial_threads] would cause an IndexError since initial_threads is an integer, not a list length.
Apply this diff to fix the extra thread listing:
- print(f"❌ Extra threads: {[t.name for t in remaining_threads if t not in threading.enumerate()[:initial_threads]]}")
+ print(f"❌ Extra threads: {[t.name for t in remaining_threads]}")Or if you want to be more precise about which threads are truly "extra":
- print(f"❌ Extra threads: {[t.name for t in remaining_threads if t not in threading.enumerate()[:initial_threads]]}")
+ print(f"❌ All remaining threads: {[t.name for t in remaining_threads]}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Check if thread count is reasonable (standardized tolerance) | |
| thread_difference = final_threads - initial_threads | |
| if thread_difference <= 2: # Standardized tolerance | |
| print("✅ Telemetry cleanup appears to be working correctly") | |
| return True | |
| else: | |
| print(f"❌ Possible thread leak detected: {thread_difference} extra threads") | |
| print(f"❌ Extra threads: {[t.name for t in remaining_threads if t not in threading.enumerate()[:initial_threads]]}") | |
| return False | |
| # Check if thread count is reasonable (standardized tolerance) | |
| thread_difference = final_threads - initial_threads | |
| if thread_difference <= 2: # Standardized tolerance | |
| print("✅ Telemetry cleanup appears to be working correctly") | |
| return True | |
| else: | |
| print(f"❌ Possible thread leak detected: {thread_difference} extra threads") | |
| - print(f"❌ Extra threads: {[t.name for t in remaining_threads if t not in threading.enumerate()[:initial_threads]]}") | |
| + print(f"❌ Extra threads: {[t.name for t in remaining_threads]}") | |
| return False |
🤖 Prompt for AI Agents
In test_telemetry_cleanup_fix.py around lines 50 to 58, the code incorrectly
attempts to slice the list returned by threading.enumerate() using an integer
initial_threads, causing an IndexError. To fix this, replace the slicing with a
proper set difference operation that identifies threads present in
remaining_threads but not in the initial thread list. Capture the initial
threads in a list before the test and then compare against the current threads
to accurately list extra threads without slicing errors.
| # Use try...finally to ensure telemetry cleanup regardless of exit path | ||
| try: | ||
|
|
||
| if self._using_custom_llm: | ||
| try: | ||
| # Special handling for MCP tools when using provider/model format | ||
| # Fix: Handle empty tools list properly - use self.tools if tools is None or empty | ||
| if tools is None or (isinstance(tools, list) and len(tools) == 0): | ||
| tool_param = self.tools | ||
| else: | ||
| tool_param = tools | ||
|
|
||
| # Convert MCP tool objects to OpenAI format if needed | ||
| if tool_param is not None: | ||
| from ..mcp.mcp import MCP | ||
| if isinstance(tool_param, MCP) and hasattr(tool_param, 'to_openai_tool'): | ||
| logging.debug("Converting MCP tool to OpenAI format") | ||
| openai_tool = tool_param.to_openai_tool() | ||
| if openai_tool: | ||
| # Handle both single tool and list of tools | ||
| if isinstance(openai_tool, list): | ||
| tool_param = openai_tool | ||
| else: | ||
| tool_param = [openai_tool] | ||
| logging.debug(f"Converted MCP tool: {tool_param}") | ||
| if self._using_custom_llm: | ||
| try: | ||
| # Special handling for MCP tools when using provider/model format | ||
| # Fix: Handle empty tools list properly - use self.tools if tools is None or empty | ||
| if tools is None or (isinstance(tools, list) and len(tools) == 0): | ||
| tool_param = self.tools | ||
| else: | ||
| tool_param = tools | ||
|
|
||
| # Convert MCP tool objects to OpenAI format if needed | ||
| if tool_param is not None: | ||
| from ..mcp.mcp import MCP | ||
| if isinstance(tool_param, MCP) and hasattr(tool_param, 'to_openai_tool'): | ||
| logging.debug("Converting MCP tool to OpenAI format") | ||
| openai_tool = tool_param.to_openai_tool() | ||
| if openai_tool: | ||
| # Handle both single tool and list of tools | ||
| if isinstance(openai_tool, list): | ||
| tool_param = openai_tool | ||
| else: | ||
| tool_param = [openai_tool] | ||
| logging.debug(f"Converted MCP tool: {tool_param}") | ||
|
|
||
| # Store chat history length for potential rollback | ||
| chat_history_length = len(self.chat_history) | ||
|
|
||
| # Normalize prompt content for consistent chat history storage | ||
| normalized_content = prompt | ||
| if isinstance(prompt, list): | ||
| # Extract text from multimodal prompts | ||
| normalized_content = next((item["text"] for item in prompt if item.get("type") == "text"), "") | ||
|
|
||
| # Prevent duplicate messages | ||
| if not (self.chat_history and | ||
| self.chat_history[-1].get("role") == "user" and | ||
| self.chat_history[-1].get("content") == normalized_content): | ||
| # Add user message to chat history BEFORE LLM call so handoffs can access it | ||
| self.chat_history.append({"role": "user", "content": normalized_content}) | ||
|
|
||
| try: | ||
| # Pass everything to LLM class | ||
| response_text = self.llm_instance.get_response( | ||
| prompt=prompt, | ||
| system_prompt=self._build_system_prompt(tools), | ||
| chat_history=self.chat_history, | ||
| temperature=temperature, | ||
| tools=tool_param, | ||
| output_json=output_json, | ||
| output_pydantic=output_pydantic, | ||
| verbose=self.verbose, | ||
| markdown=self.markdown, | ||
| self_reflect=self.self_reflect, | ||
| max_reflect=self.max_reflect, | ||
| min_reflect=self.min_reflect, | ||
| console=self.console, | ||
| agent_name=self.name, | ||
| agent_role=self.role, | ||
| agent_tools=[t.__name__ if hasattr(t, '__name__') else str(t) for t in (tools if tools is not None else self.tools)], | ||
| task_name=task_name, | ||
| task_description=task_description, | ||
| task_id=task_id, | ||
| execute_tool_fn=self.execute_tool, # Pass tool execution function | ||
| reasoning_steps=reasoning_steps, | ||
| stream=stream # Pass the stream parameter from chat method | ||
| ) | ||
|
|
||
| self.chat_history.append({"role": "assistant", "content": response_text}) | ||
|
|
||
| # Log completion time if in debug mode | ||
| if logging.getLogger().getEffectiveLevel() == logging.DEBUG: | ||
| total_time = time.time() - start_time | ||
| logging.debug(f"Agent.chat completed in {total_time:.2f} seconds") | ||
|
|
||
| # Apply guardrail validation for custom LLM response | ||
| try: | ||
| validated_response = self._apply_guardrail_with_retry(response_text, prompt, temperature, tools, task_name, task_description, task_id) | ||
| return validated_response | ||
| except Exception as e: | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed for custom LLM: {e}") | ||
| # Rollback chat history on guardrail failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| return None | ||
| except Exception as e: | ||
| # Rollback chat history if LLM call fails | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| display_error(f"Error in LLM chat: {e}") | ||
| return None | ||
| except Exception as e: | ||
| display_error(f"Error in LLM chat: {e}") | ||
| return None | ||
| else: | ||
| # Use the new _build_messages helper method | ||
| messages, original_prompt = self._build_messages(prompt, temperature, output_json, output_pydantic) | ||
|
|
||
| # Store chat history length for potential rollback | ||
| chat_history_length = len(self.chat_history) | ||
|
|
||
| # Normalize prompt content for consistent chat history storage | ||
| normalized_content = prompt | ||
| if isinstance(prompt, list): | ||
| # Normalize original_prompt for consistent chat history storage | ||
| normalized_content = original_prompt | ||
| if isinstance(original_prompt, list): | ||
| # Extract text from multimodal prompts | ||
| normalized_content = next((item["text"] for item in prompt if item.get("type") == "text"), "") | ||
| normalized_content = next((item["text"] for item in original_prompt if item.get("type") == "text"), "") | ||
|
|
||
| # Prevent duplicate messages | ||
| if not (self.chat_history and | ||
| self.chat_history[-1].get("role") == "user" and | ||
| self.chat_history[-1].get("content") == normalized_content): | ||
| # Add user message to chat history BEFORE LLM call so handoffs can access it | ||
| self.chat_history.append({"role": "user", "content": normalized_content}) | ||
|
|
||
| reflection_count = 0 | ||
| start_time = time.time() | ||
|
|
||
| # Wrap entire while loop in try-except for rollback on any failure | ||
| try: | ||
| # Pass everything to LLM class | ||
| response_text = self.llm_instance.get_response( | ||
| prompt=prompt, | ||
| system_prompt=self._build_system_prompt(tools), | ||
| chat_history=self.chat_history, | ||
| temperature=temperature, | ||
| tools=tool_param, | ||
| output_json=output_json, | ||
| output_pydantic=output_pydantic, | ||
| verbose=self.verbose, | ||
| markdown=self.markdown, | ||
| self_reflect=self.self_reflect, | ||
| max_reflect=self.max_reflect, | ||
| min_reflect=self.min_reflect, | ||
| console=self.console, | ||
| agent_name=self.name, | ||
| agent_role=self.role, | ||
| agent_tools=[t.__name__ if hasattr(t, '__name__') else str(t) for t in (tools if tools is not None else self.tools)], | ||
| task_name=task_name, | ||
| task_description=task_description, | ||
| task_id=task_id, | ||
| execute_tool_fn=self.execute_tool, # Pass tool execution function | ||
| reasoning_steps=reasoning_steps, | ||
| stream=stream # Pass the stream parameter from chat method | ||
| ) | ||
|
|
||
| self.chat_history.append({"role": "assistant", "content": response_text}) | ||
|
|
||
| # Log completion time if in debug mode | ||
| if logging.getLogger().getEffectiveLevel() == logging.DEBUG: | ||
| total_time = time.time() - start_time | ||
| logging.debug(f"Agent.chat completed in {total_time:.2f} seconds") | ||
|
|
||
| # Apply guardrail validation for custom LLM response | ||
| try: | ||
| validated_response = self._apply_guardrail_with_retry(response_text, prompt, temperature, tools, task_name, task_description, task_id) | ||
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() | ||
| return validated_response | ||
| except Exception as e: | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed for custom LLM: {e}") | ||
| # Rollback chat history on guardrail failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() | ||
| return None | ||
| except Exception as e: | ||
| # Rollback chat history if LLM call fails | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| display_error(f"Error in LLM chat: {e}") | ||
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() | ||
| return None | ||
| except Exception as e: | ||
| display_error(f"Error in LLM chat: {e}") | ||
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() | ||
| return None | ||
| else: | ||
| # Use the new _build_messages helper method | ||
| messages, original_prompt = self._build_messages(prompt, temperature, output_json, output_pydantic) | ||
|
|
||
| # Store chat history length for potential rollback | ||
| chat_history_length = len(self.chat_history) | ||
|
|
||
| # Normalize original_prompt for consistent chat history storage | ||
| normalized_content = original_prompt | ||
| if isinstance(original_prompt, list): | ||
| # Extract text from multimodal prompts | ||
| normalized_content = next((item["text"] for item in original_prompt if item.get("type") == "text"), "") | ||
|
|
||
| # Prevent duplicate messages | ||
| if not (self.chat_history and | ||
| self.chat_history[-1].get("role") == "user" and | ||
| self.chat_history[-1].get("content") == normalized_content): | ||
| # Add user message to chat history BEFORE LLM call so handoffs can access it | ||
| self.chat_history.append({"role": "user", "content": normalized_content}) | ||
|
|
||
| reflection_count = 0 | ||
| start_time = time.time() | ||
|
|
||
| # Wrap entire while loop in try-except for rollback on any failure | ||
| try: | ||
| while True: | ||
| try: | ||
| if self.verbose: | ||
| # Handle both string and list prompts for instruction display | ||
| display_text = prompt | ||
| if isinstance(prompt, list): | ||
| # Extract text content from multimodal prompt | ||
| display_text = next((item["text"] for item in prompt if item["type"] == "text"), "") | ||
|
|
||
| if display_text and str(display_text).strip(): | ||
| # Pass agent information to display_instruction | ||
| agent_tools = [t.__name__ if hasattr(t, '__name__') else str(t) for t in self.tools] | ||
| display_instruction( | ||
| f"Agent {self.name} is processing prompt: {display_text}", | ||
| console=self.console, | ||
| agent_name=self.name, | ||
| agent_role=self.role, | ||
| agent_tools=agent_tools | ||
| ) | ||
|
|
||
| response = self._chat_completion(messages, temperature=temperature, tools=tools if tools else None, reasoning_steps=reasoning_steps, stream=self.stream, task_name=task_name, task_description=task_description, task_id=task_id) | ||
| if not response: | ||
| # Rollback chat history on response failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| return None | ||
|
|
||
| response_text = response.choices[0].message.content.strip() | ||
|
|
||
| # Handle output_json or output_pydantic if specified | ||
| if output_json or output_pydantic: | ||
| # Add to chat history and return raw response | ||
| # User message already added before LLM call via _build_messages | ||
| self.chat_history.append({"role": "assistant", "content": response_text}) | ||
| # Apply guardrail validation even for JSON output | ||
| try: | ||
| validated_response = self._apply_guardrail_with_retry(response_text, original_prompt, temperature, tools, task_name, task_description, task_id) | ||
| # Execute callback after validation | ||
| self._execute_callback_and_display(original_prompt, validated_response, time.time() - start_time, task_name, task_description, task_id) | ||
| return validated_response | ||
| except Exception as e: | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed for JSON output: {e}") | ||
| # Rollback chat history on guardrail failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| return None | ||
|
|
||
| if not self.self_reflect: | ||
| # User message already added before LLM call via _build_messages | ||
| self.chat_history.append({"role": "assistant", "content": response_text}) | ||
| if self.verbose: | ||
| logging.debug(f"Agent {self.name} final response: {response_text}") | ||
| # Return only reasoning content if reasoning_steps is True | ||
| if reasoning_steps and hasattr(response.choices[0].message, 'reasoning_content'): | ||
| # Apply guardrail to reasoning content | ||
| try: | ||
| validated_reasoning = self._apply_guardrail_with_retry(response.choices[0].message.reasoning_content, original_prompt, temperature, tools, task_name, task_description, task_id) | ||
| # Execute callback after validation | ||
| self._execute_callback_and_display(original_prompt, validated_reasoning, time.time() - start_time, task_name, task_description, task_id) | ||
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() | ||
| return validated_reasoning | ||
| except Exception as e: | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed for reasoning content: {e}") | ||
| # Rollback chat history on guardrail failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() | ||
| return None | ||
| # Apply guardrail to regular response | ||
| try: | ||
| validated_response = self._apply_guardrail_with_retry(response_text, original_prompt, temperature, tools, task_name, task_description, task_id) | ||
| # Execute callback after validation | ||
| self._execute_callback_and_display(original_prompt, validated_response, time.time() - start_time, task_name, task_description, task_id) | ||
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() | ||
| return validated_response | ||
| except Exception as e: | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed: {e}") | ||
| # Rollback chat history on guardrail failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() | ||
| return None | ||
|
|
||
| reflection_prompt = f""" | ||
| Reflect on your previous response: '{response_text}'. | ||
| {self.reflect_prompt if self.reflect_prompt else "Identify any flaws, improvements, or actions."} | ||
| Provide a "satisfactory" status ('yes' or 'no'). | ||
| Output MUST be JSON with 'reflection' and 'satisfactory'. | ||
| """ | ||
| logging.debug(f"{self.name} reflection attempt {reflection_count+1}, sending prompt: {reflection_prompt}") | ||
| messages.append({"role": "user", "content": reflection_prompt}) | ||
|
|
||
| while True: | ||
| try: | ||
| # Check if we're using a custom LLM (like Gemini) | ||
| if self._using_custom_llm or self._openai_client is None: | ||
| # For custom LLMs, we need to handle reflection differently | ||
| # Use non-streaming to get complete JSON response | ||
| reflection_response = self._chat_completion(messages, temperature=temperature, tools=None, stream=False, reasoning_steps=False, task_name=task_name, task_description=task_description, task_id=task_id) | ||
|
|
||
| if not reflection_response or not reflection_response.choices: | ||
| raise Exception("No response from reflection request") | ||
|
|
||
| reflection_text = reflection_response.choices[0].message.content.strip() | ||
|
|
||
| # Clean the JSON output | ||
| cleaned_json = self.clean_json_output(reflection_text) | ||
|
|
||
| # Parse the JSON manually | ||
| reflection_data = json.loads(cleaned_json) | ||
|
|
||
| # Create a reflection output object manually | ||
| class CustomReflectionOutput: | ||
| def __init__(self, data): | ||
| self.reflection = data.get('reflection', '') | ||
| self.satisfactory = data.get('satisfactory', 'no').lower() | ||
| if self.verbose: | ||
| # Handle both string and list prompts for instruction display | ||
| display_text = prompt | ||
| if isinstance(prompt, list): | ||
| # Extract text content from multimodal prompt | ||
| display_text = next((item["text"] for item in prompt if item["type"] == "text"), "") | ||
|
|
||
| reflection_output = CustomReflectionOutput(reflection_data) | ||
| else: | ||
| # Use OpenAI's structured output for OpenAI models | ||
| reflection_response = self._openai_client.sync_client.beta.chat.completions.parse( | ||
| model=self.reflect_llm if self.reflect_llm else self.llm, | ||
| messages=messages, | ||
| temperature=temperature, | ||
| response_format=ReflectionOutput | ||
| ) | ||
|
|
||
| reflection_output = reflection_response.choices[0].message.parsed | ||
| if display_text and str(display_text).strip(): | ||
| # Pass agent information to display_instruction | ||
| agent_tools = [t.__name__ if hasattr(t, '__name__') else str(t) for t in self.tools] | ||
| display_instruction( | ||
| f"Agent {self.name} is processing prompt: {display_text}", | ||
| console=self.console, | ||
| agent_name=self.name, | ||
| agent_role=self.role, | ||
| agent_tools=agent_tools | ||
| ) | ||
|
|
||
| if self.verbose: | ||
| display_self_reflection(f"Agent {self.name} self reflection (using {self.reflect_llm if self.reflect_llm else self.llm}): reflection='{reflection_output.reflection}' satisfactory='{reflection_output.satisfactory}'", console=self.console) | ||
| response = self._chat_completion(messages, temperature=temperature, tools=tools if tools else None, reasoning_steps=reasoning_steps, stream=self.stream, task_name=task_name, task_description=task_description, task_id=task_id) | ||
| if not response: | ||
| # Rollback chat history on response failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| return None | ||
|
|
||
| messages.append({"role": "assistant", "content": f"Self Reflection: {reflection_output.reflection} Satisfactory?: {reflection_output.satisfactory}"}) | ||
| response_text = response.choices[0].message.content.strip() | ||
|
|
||
| # Only consider satisfactory after minimum reflections | ||
| if reflection_output.satisfactory == "yes" and reflection_count >= self.min_reflect - 1: | ||
| if self.verbose: | ||
| display_self_reflection("Agent marked the response as satisfactory after meeting minimum reflections", console=self.console) | ||
| # Handle output_json or output_pydantic if specified | ||
| if output_json or output_pydantic: | ||
| # Add to chat history and return raw response | ||
| # User message already added before LLM call via _build_messages | ||
| self.chat_history.append({"role": "assistant", "content": response_text}) | ||
| # Apply guardrail validation after satisfactory reflection | ||
| # Apply guardrail validation even for JSON output | ||
| try: | ||
| validated_response = self._apply_guardrail_with_retry(response_text, original_prompt, temperature, tools, task_name, task_description, task_id) | ||
| # Execute callback after validation | ||
| self._execute_callback_and_display(original_prompt, validated_response, time.time() - start_time, task_name, task_description, task_id) | ||
| return validated_response | ||
| except Exception as e: | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed after reflection: {e}") | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed for JSON output: {e}") | ||
| # Rollback chat history on guardrail failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| return None | ||
|
|
||
| # Check if we've hit max reflections | ||
| if reflection_count >= self.max_reflect - 1: | ||
| if self.verbose: | ||
| display_self_reflection("Maximum reflection count reached, returning current response", console=self.console) | ||
| if not self.self_reflect: | ||
| # User message already added before LLM call via _build_messages | ||
| self.chat_history.append({"role": "assistant", "content": response_text}) | ||
| # Apply guardrail validation after max reflections | ||
| if self.verbose: | ||
| logging.debug(f"Agent {self.name} final response: {response_text}") | ||
| # Return only reasoning content if reasoning_steps is True | ||
| if reasoning_steps and hasattr(response.choices[0].message, 'reasoning_content'): | ||
| # Apply guardrail to reasoning content | ||
| try: | ||
| validated_reasoning = self._apply_guardrail_with_retry(response.choices[0].message.reasoning_content, original_prompt, temperature, tools, task_name, task_description, task_id) | ||
| # Execute callback after validation | ||
| self._execute_callback_and_display(original_prompt, validated_reasoning, time.time() - start_time, task_name, task_description, task_id) | ||
| return validated_reasoning | ||
| except Exception as e: | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed for reasoning content: {e}") | ||
| # Rollback chat history on guardrail failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| return None | ||
| # Apply guardrail to regular response | ||
| try: | ||
| validated_response = self._apply_guardrail_with_retry(response_text, original_prompt, temperature, tools, task_name, task_description, task_id) | ||
| # Execute callback after validation | ||
| self._execute_callback_and_display(original_prompt, validated_response, time.time() - start_time, task_name, task_description, task_id) | ||
| return validated_response | ||
| except Exception as e: | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed after max reflections: {e}") | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed: {e}") | ||
| # Rollback chat history on guardrail failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| return None | ||
|
|
||
| # If not satisfactory and not at max reflections, continue with regeneration | ||
| logging.debug(f"{self.name} reflection count {reflection_count + 1}, continuing reflection process") | ||
| messages.append({"role": "user", "content": "Now regenerate your response using the reflection you made"}) | ||
| # For custom LLMs during reflection, always use non-streaming to ensure complete responses | ||
| use_stream = self.stream if not self._using_custom_llm else False | ||
| response = self._chat_completion(messages, temperature=temperature, tools=None, stream=use_stream, task_name=task_name, task_description=task_description, task_id=task_id) | ||
| response_text = response.choices[0].message.content.strip() | ||
| reflection_count += 1 | ||
| continue # Continue the loop for more reflections | ||
|
|
||
| except Exception as e: | ||
| display_error(f"Error in parsing self-reflection json {e}. Retrying", console=self.console) | ||
| logging.error("Reflection parsing failed.", exc_info=True) | ||
| messages.append({"role": "assistant", "content": "Self Reflection failed."}) | ||
| reflection_prompt = f""" | ||
| Reflect on your previous response: '{response_text}'. | ||
| {self.reflect_prompt if self.reflect_prompt else "Identify any flaws, improvements, or actions."} | ||
| Provide a "satisfactory" status ('yes' or 'no'). | ||
| Output MUST be JSON with 'reflection' and 'satisfactory'. | ||
| """ | ||
| logging.debug(f"{self.name} reflection attempt {reflection_count+1}, sending prompt: {reflection_prompt}") | ||
| messages.append({"role": "user", "content": reflection_prompt}) | ||
|
|
||
| try: | ||
| # Check if we're using a custom LLM (like Gemini) | ||
| if self._using_custom_llm or self._openai_client is None: | ||
| # For custom LLMs, we need to handle reflection differently | ||
| # Use non-streaming to get complete JSON response | ||
| reflection_response = self._chat_completion(messages, temperature=temperature, tools=None, stream=False, reasoning_steps=False, task_name=task_name, task_description=task_description, task_id=task_id) | ||
|
|
||
| if not reflection_response or not reflection_response.choices: | ||
| raise Exception("No response from reflection request") | ||
|
|
||
| reflection_text = reflection_response.choices[0].message.content.strip() | ||
|
|
||
| # Clean the JSON output | ||
| cleaned_json = self.clean_json_output(reflection_text) | ||
|
|
||
| # Parse the JSON manually | ||
| reflection_data = json.loads(cleaned_json) | ||
|
|
||
| # Create a reflection output object manually | ||
| class CustomReflectionOutput: | ||
| def __init__(self, data): | ||
| self.reflection = data.get('reflection', '') | ||
| self.satisfactory = data.get('satisfactory', 'no').lower() | ||
|
|
||
| reflection_output = CustomReflectionOutput(reflection_data) | ||
| else: | ||
| # Use OpenAI's structured output for OpenAI models | ||
| reflection_response = self._openai_client.sync_client.beta.chat.completions.parse( | ||
| model=self.reflect_llm if self.reflect_llm else self.llm, | ||
| messages=messages, | ||
| temperature=temperature, | ||
| response_format=ReflectionOutput | ||
| ) | ||
|
|
||
| reflection_output = reflection_response.choices[0].message.parsed | ||
|
|
||
| if self.verbose: | ||
| display_self_reflection(f"Agent {self.name} self reflection (using {self.reflect_llm if self.reflect_llm else self.llm}): reflection='{reflection_output.reflection}' satisfactory='{reflection_output.satisfactory}'", console=self.console) | ||
|
|
||
| messages.append({"role": "assistant", "content": f"Self Reflection: {reflection_output.reflection} Satisfactory?: {reflection_output.satisfactory}"}) | ||
|
|
||
| # Only consider satisfactory after minimum reflections | ||
| if reflection_output.satisfactory == "yes" and reflection_count >= self.min_reflect - 1: | ||
| if self.verbose: | ||
| display_self_reflection("Agent marked the response as satisfactory after meeting minimum reflections", console=self.console) | ||
| # User message already added before LLM call via _build_messages | ||
| self.chat_history.append({"role": "assistant", "content": response_text}) | ||
| # Apply guardrail validation after satisfactory reflection | ||
| try: | ||
| validated_response = self._apply_guardrail_with_retry(response_text, original_prompt, temperature, tools, task_name, task_description, task_id) | ||
| # Execute callback after validation | ||
| self._execute_callback_and_display(original_prompt, validated_response, time.time() - start_time, task_name, task_description, task_id) | ||
| return validated_response | ||
| except Exception as e: | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed after reflection: {e}") | ||
| # Rollback chat history on guardrail failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| return None | ||
|
|
||
| # Check if we've hit max reflections | ||
| if reflection_count >= self.max_reflect - 1: | ||
| if self.verbose: | ||
| display_self_reflection("Maximum reflection count reached, returning current response", console=self.console) | ||
| # User message already added before LLM call via _build_messages | ||
| self.chat_history.append({"role": "assistant", "content": response_text}) | ||
| # Apply guardrail validation after max reflections | ||
| try: | ||
| validated_response = self._apply_guardrail_with_retry(response_text, original_prompt, temperature, tools, task_name, task_description, task_id) | ||
| # Execute callback after validation | ||
| self._execute_callback_and_display(original_prompt, validated_response, time.time() - start_time, task_name, task_description, task_id) | ||
| return validated_response | ||
| except Exception as e: | ||
| logging.error(f"Agent {self.name}: Guardrail validation failed after max reflections: {e}") | ||
| # Rollback chat history on guardrail failure | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| return None | ||
|
|
||
| # If not satisfactory and not at max reflections, continue with regeneration | ||
| logging.debug(f"{self.name} reflection count {reflection_count + 1}, continuing reflection process") | ||
| messages.append({"role": "user", "content": "Now regenerate your response using the reflection you made"}) | ||
| # For custom LLMs during reflection, always use non-streaming to ensure complete responses | ||
| use_stream = self.stream if not self._using_custom_llm else False | ||
| response = self._chat_completion(messages, temperature=temperature, tools=None, stream=use_stream, task_name=task_name, task_description=task_description, task_id=task_id) | ||
| response_text = response.choices[0].message.content.strip() | ||
| reflection_count += 1 | ||
| continue # Continue even after error to try again | ||
| except Exception: | ||
| # Catch any exception from the inner try block and re-raise to outer handler | ||
| raise | ||
| except Exception as e: | ||
| # Catch any exceptions that escape the while loop | ||
| display_error(f"Unexpected error in chat: {e}", console=self.console) | ||
| # Rollback chat history | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| # Ensure proper cleanup of telemetry system to prevent hanging | ||
| self._cleanup_telemetry() | ||
| return None | ||
| continue # Continue the loop for more reflections | ||
|
|
||
| except Exception as e: | ||
| display_error(f"Error in parsing self-reflection json {e}. Retrying", console=self.console) | ||
| logging.error("Reflection parsing failed.", exc_info=True) | ||
| messages.append({"role": "assistant", "content": "Self Reflection failed."}) | ||
| reflection_count += 1 | ||
| continue # Continue even after error to try again | ||
| except Exception: | ||
| # Catch any exception from the inner try block and re-raise to outer handler | ||
| raise | ||
| except Exception as e: | ||
| # Catch any exceptions that escape the while loop | ||
| display_error(f"Unexpected error in chat: {e}", console=self.console) | ||
| # Rollback chat history | ||
| self.chat_history = self.chat_history[:chat_history_length] | ||
| return None | ||
| finally: | ||
| # Always cleanup telemetry regardless of how the method exits | ||
| self._cleanup_telemetry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Ensure comprehensive test coverage for error handling paths.
The refactoring introduces multiple error handling and rollback scenarios. Please ensure test coverage includes:
- Chat history rollback on guardrail failures
- Telemetry cleanup on various exception types
- Self-reflection parsing errors and retries
- Concurrent access scenarios (given the noted thread-safety concern with chat_history)
🏁 Script executed:
#!/bin/bash
# Check for test files covering the refactored error handling
echo "Searching for test files related to Agent class..."
fd -e py test | xargs rg -l "test.*chat.*error|test.*telemetry.*cleanup|test.*guardrail.*fail|test.*reflection.*error" -A 3Length of output: 299
🏁 Script executed:
#!/bin/bash
# Verify test coverage for additional error handling scenarios
echo "Guardrail failure tests:"
rg -l "guardrail" -g "*test*.py" || echo "No guardrail tests found"
echo "Reflection parsing error tests:"
rg -l "reflection" -g "*test*.py" || echo "No reflection tests found"
echo "Chat history rollback tests:"
rg -l "rollback" -g "*test*.py" || echo "No rollback tests found"
echo "Concurrency/thread-safety tests:"
rg -l "thread" -g "*test*.py" || echo "No concurrency tests found"Length of output: 1538
Add focused tests for chat history rollback on error paths.
Existing tests cover telemetry cleanup, guardrail failures, self-reflection parsing, and concurrency, but we didn’t find any tests that verify the chat history is correctly rolled back when:
- The LLM call itself throws an exception
- Guardrail validation fails
- Any unexpected error bubbles up before completion
Please add unit tests that simulate these failure scenarios and assert that agent.chat_history remains unchanged after each error.
🤖 Prompt for AI Agents
In src/praisonai-agents/praisonaiagents/agent/agent.py around lines 1227 to
1533, add unit tests to verify chat history rollback on error scenarios.
Specifically, create tests that simulate exceptions thrown by the LLM call,
failures in guardrail validation, and unexpected errors during the chat method
execution. For each test, ensure the agent's chat_history is unchanged after the
error occurs, confirming proper rollback behavior. Use mocking or stubbing to
trigger these errors and assert chat_history consistency post-failure.

This PR addresses pr #1002
Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests