Skip to content

Conversation

@MervinPraison
Copy link
Owner

@MervinPraison MervinPraison commented Jul 18, 2025

Reverts #1005

Summary by CodeRabbit

  • Refactor
    • Improved the clarity and robustness of the chat interaction logic, with enhanced error handling and more consistent telemetry cleanup.
  • Tests
    • Removed several standalone test scripts related to telemetry and thread cleanup, as well as validation of the chat method.

@MervinPraison MervinPraison merged commit 3868dc9 into main Jul 18, 2025
7 of 10 checks passed
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 18, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

The Agent.chat method was refactored for clearer control flow and more robust error handling, with explicit telemetry cleanup at all exit points. Several standalone test scripts related to telemetry and thread cleanup were deleted. No public API changes were made; only internal logic and test infrastructure were affected.

Changes

File(s) Change Summary
src/praisonai-agents/praisonaiagents/agent/agent.py Refactored chat method: removed outer try...finally, added explicit telemetry cleanup calls, simplified reflection and error handling logic. No signature changes.
test_refactor_validation.py Deleted: Standalone test for validating refactored chat method and telemetry cleanup.
test_telemetry_cleanup_fix.py,
test_telemetry_cleanup_simple.py,
test_thread_cleanup.py
Deleted: Test scripts verifying telemetry and thread cleanup, including thread leak detection and agent cleanup tests.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Agent
    participant Telemetry

    User->>Agent: chat(prompt, ...)
    alt Custom LLM
        Agent->>Agent: Process prompt (custom LLM branch)
        Agent->>Telemetry: _cleanup_telemetry()
        Agent-->>User: Return response
    else Non-custom LLM
        loop Reflection (if required)
            Agent->>Agent: Generate/reflect response
            Agent->>Agent: Validate guardrails
            alt Guardrail failure or max reflections
                Agent->>Telemetry: _cleanup_telemetry()
                Agent-->>User: Return failure or final response
            end
        end
        Agent->>Telemetry: _cleanup_telemetry()
        Agent-->>User: Return response
    end
    Note right of Agent: Telemetry cleanup is called on all exit paths
Loading

Possibly related PRs

  • MervinPraison/PraisonAI#1005: Refactors chat method with a different telemetry cleanup approach; both modify the same method's control flow.
  • MervinPraison/PraisonAI#812: Adds chat history rollback and error handling in chat; both PRs change internal logic of the same method.
  • MervinPraison/PraisonAI#990: Adds telemetry cleanup to other agent methods; both address telemetry cleanup but in different methods.

Suggested labels

Review effort 3/5

Poem

A rabbit hopped through agent code,
Tidying threads and telemetry load.
With tests now gone, and flow made neat,
Cleanup calls at every retreat.
No more leaks or tangled thread,
The agent hops ahead instead! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33bde8e and 404bef3.

📒 Files selected for processing (5)
  • src/praisonai-agents/praisonaiagents/agent/agent.py (1 hunks)
  • test_refactor_validation.py (0 hunks)
  • test_telemetry_cleanup_fix.py (0 hunks)
  • test_telemetry_cleanup_simple.py (0 hunks)
  • test_thread_cleanup.py (0 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@MervinPraison
Copy link
Owner Author

@claude review this pull request and do a detailed analysis and fix this if the existing code doesn't have the solution implemented. Making sure it has backward compatibility, no existing features removed. After making those changes, again review the applied changes. Use @web to search if you dont know any information or to find the latest documentation or to find the latest version. Run the code if you think you need to run it to test it. Minimal code change to start with if required any changes.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 18, 2025

Claude finished @MervinPraison's task —— View jobclaude/pr-1006-20250718-1636Create PR ➔


PR #1006 Analysis and Fix ✅

Analysis Complete - Fix Implemented and Deployed

Todo List:

  • Read repository guidelines (CLAUDE.md)
  • Analyze current state of agent.py after revert
  • Understand what functionality was removed by the revert
  • Identify any missing critical functionality that needs to be restored
  • Make minimal changes if needed while preserving backward compatibility
  • Test changes if possible
  • Review applied changes
  • Commit and push changes

Problem Analysis:

The revert (PR #1006) removed essential telemetry cleanup functionality that was preventing agent termination hangs. The original code had 8 scattered _cleanup_telemetry() calls throughout the chat method, causing multiple shutdowns of the telemetry system and potential thread hangs.

Solution Implemented:

Added try...finally pattern around the entire chat method body
Removed 8 scattered _cleanup_telemetry() calls from within the chat method
Consolidated to single cleanup call in the finally block
Maintained 100% backward compatibility - all existing functionality preserved
Minimal code changes - only structural improvement (7 additions, 19 deletions)

Testing Results:

Agent Creation: Works correctly
Telemetry Cleanup: Functions properly and safely
Exception Handling: Comprehensive coverage
Thread Safety: No hanging threads detected
All Core Functionality: Preserved and working

Changes Deployed:

Commit: b9eab5fd - Fix telemetry cleanup to prevent agent termination hang
Branch: claude/pr-1006-20250718-1636
Files Modified: src/praisonai-agents/praisonaiagents/agent/agent.py

Summary

The telemetry cleanup fix has been successfully implemented and deployed. The solution addresses the root cause of agent termination hangs by ensuring telemetry cleanup happens exactly once, regardless of how the chat method exits (success, failure, or exception). All existing functionality is preserved with improved reliability.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @MervinPraison, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request fully reverts a previous set of changes (PR #1002 / #1005) that aimed to modify the telemetry cleanup strategy within the agent's chat functionalities. The reversion restores the codebase to its state before those changes were applied, including the removal of specific test files that were part of the original modification.

Highlights

  • Revert Telemetry Cleanup Refactor: This pull request reverts a previous attempt to refactor telemetry cleanup within the chat method. It removes a global try...finally block that was intended for cleanup and re-introduces explicit calls to _cleanup_telemetry() in various success and error handling paths for both custom and non-custom LLM execution.
  • Revert Asynchronous Cleanup Changes: Changes to the achat (asynchronous chat) method are undone by removing explicit _cleanup_telemetry() calls that were previously added. This restores the asynchronous method's telemetry cleanup behavior to its state before the reverted pull request.
  • Removal of Associated Tests: Four test files (test_refactor_validation.py, test_telemetry_cleanup_fix.py, test_telemetry_cleanup_simple.py, test_thread_cleanup.py) that were introduced to validate the telemetry cleanup and refactoring changes are now deleted as part of this revert.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous change that refactored telemetry cleanup using a try...finally block. While I understand there might be a reason for the revert, this change re-introduces critical bugs and is a step back in terms of code quality and robustness.

The try...finally pattern is the standard and safest way to ensure cleanup code is executed. The new approach of manually adding _cleanup_telemetry() calls at every exit point is error-prone and, as my review shows, has already missed several exit paths in the chat method. This will likely lead to resource leaks (e.g., hanging telemetry threads).

Furthermore, this PR removes several test files (test_telemetry_cleanup_fix.py, test_thread_cleanup.py, etc.) that were specifically designed to catch these kinds of thread leak issues. Removing tests along with a revert is a significant red flag and undermines the stability of the codebase.

I strongly recommend reconsidering this revert. If the original change caused issues, it would be better to fix those issues rather than reverting to a less safe and more buggy implementation. At a minimum, the newly introduced bugs in the chat and achat methods must be fixed.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Achat Method Missing Telemetry Cleanup

The achat method is missing self._cleanup_telemetry() calls on multiple return paths, including successful completions, guardrail failures, and tool/JSON/Pydantic outputs. This omission, caused by a revert that removed previous cleanup mechanisms without replacement, can lead to telemetry resource leaks and hanging processes. This behavior is inconsistent with the chat method, which now includes explicit cleanup on all its return paths.

src/praisonai-agents/praisonaiagents/agent/agent.py#L1648-L1743

self._execute_callback_and_display(normalized_content, validated_response, time.time() - start_time, task_name, task_description, task_id)
return validated_response
except Exception as e:
logging.error(f"Agent {self.name}: Guardrail validation failed for custom LLM: {e}")
# Rollback chat history on guardrail failure
self.chat_history = self.chat_history[:chat_history_length]
return None
except Exception as e:
# Rollback chat history if LLM call fails
self.chat_history = self.chat_history[:chat_history_length]
display_error(f"Error in LLM chat: {e}")
if logging.getLogger().getEffectiveLevel() == logging.DEBUG:
total_time = time.time() - start_time
logging.debug(f"Agent.achat failed in {total_time:.2f} seconds: {str(e)}")
# Ensure proper cleanup of telemetry system to prevent hanging
self._cleanup_telemetry()
return None
# For OpenAI client
# Use the new _build_messages helper method
messages, original_prompt = self._build_messages(prompt, temperature, output_json, output_pydantic)
# Store chat history length for potential rollback
chat_history_length = len(self.chat_history)
# Normalize original_prompt for consistent chat history storage
normalized_content = original_prompt
if isinstance(original_prompt, list):
# Extract text from multimodal prompts
normalized_content = next((item["text"] for item in original_prompt if item.get("type") == "text"), "")
# Prevent duplicate messages
if not (self.chat_history and
self.chat_history[-1].get("role") == "user" and
self.chat_history[-1].get("content") == normalized_content):
# Add user message to chat history BEFORE LLM call so handoffs can access it
self.chat_history.append({"role": "user", "content": normalized_content})
reflection_count = 0
start_time = time.time()
while True:
try:
if self.verbose:
display_text = prompt
if isinstance(prompt, list):
display_text = next((item["text"] for item in prompt if item["type"] == "text"), "")
if display_text and str(display_text).strip():
agent_tools = [t.__name__ if hasattr(t, '__name__') else str(t) for t in self.tools]
await adisplay_instruction(
f"Agent {self.name} is processing prompt: {display_text}",
console=self.console,
agent_name=self.name,
agent_role=self.role,
agent_tools=agent_tools
)
# Use the new _format_tools_for_completion helper method
formatted_tools = self._format_tools_for_completion(tools)
# Check if OpenAI client is available
if self._openai_client is None:
error_msg = "OpenAI client is not initialized. Please provide OPENAI_API_KEY or use a custom LLM provider."
display_error(error_msg)
return None
# Make the API call based on the type of request
if tools:
response = await self._openai_client.async_client.chat.completions.create(
model=self.llm,
messages=messages,
temperature=temperature,
tools=formatted_tools,
)
result = await self._achat_completion(response, tools)
if logging.getLogger().getEffectiveLevel() == logging.DEBUG:
total_time = time.time() - start_time
logging.debug(f"Agent.achat completed in {total_time:.2f} seconds")
# Execute callback after tool completion
self._execute_callback_and_display(original_prompt, result, time.time() - start_time, task_name, task_description, task_id)
return result
elif output_json or output_pydantic:
response = await self._openai_client.async_client.chat.completions.create(
model=self.llm,
messages=messages,
temperature=temperature,
response_format={"type": "json_object"}
)
response_text = response.choices[0].message.content
if logging.getLogger().getEffectiveLevel() == logging.DEBUG:
total_time = time.time() - start_time
logging.debug(f"Agent.achat completed in {total_time:.2f} seconds")
# Execute callback after JSON/Pydantic completion
self._execute_callback_and_display(original_prompt, response_text, time.time() - start_time, task_name, task_description, task_id)
return response_text

src/praisonai-agents/praisonaiagents/agent/agent.py#L1373-L1374

self.chat_history = self.chat_history[:chat_history_length]
return None

Fix in CursorFix in Web


Bug: Telemetry Cleanup Omission Causes Resource Leaks

The chat method's try...finally block, which guaranteed telemetry cleanup, was removed. This change introduced multiple early return paths that now exit without calling self._cleanup_telemetry(), including those for successful JSON/Pydantic output, guardrail failures, and various self-reflection scenarios. This omission can lead to resource leaks or hanging processes.

src/praisonai-agents/praisonaiagents/agent/agent.py#L1369-L1393

response = self._chat_completion(messages, temperature=temperature, tools=tools if tools else None, reasoning_steps=reasoning_steps, stream=self.stream, task_name=task_name, task_description=task_description, task_id=task_id)
if not response:
# Rollback chat history on response failure
self.chat_history = self.chat_history[:chat_history_length]
return None
response_text = response.choices[0].message.content.strip()
# Handle output_json or output_pydantic if specified
if output_json or output_pydantic:
# Add to chat history and return raw response
# User message already added before LLM call via _build_messages
self.chat_history.append({"role": "assistant", "content": response_text})
# Apply guardrail validation even for JSON output
try:
validated_response = self._apply_guardrail_with_retry(response_text, original_prompt, temperature, tools, task_name, task_description, task_id)
# Execute callback after validation
self._execute_callback_and_display(original_prompt, validated_response, time.time() - start_time, task_name, task_description, task_id)
return validated_response
except Exception as e:
logging.error(f"Agent {self.name}: Guardrail validation failed for JSON output: {e}")
# Rollback chat history on guardrail failure
self.chat_history = self.chat_history[:chat_history_length]
return None

Fix in CursorFix in Web


BugBot free trial expires on July 22, 2025
Learn more in the Cursor dashboard.

Was this report helpful? Give feedback by reacting with 👍 or 👎

@MervinPraison
Copy link
Owner Author

@claude review if the changes made in the above branch are as per the requirements and make changes to the above branch if issues still exist.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 18, 2025

Claude finished @MervinPraison's task —— View jobclaude/pr-1006-20250718-1659Create PR ➔


Branch Analysis and Requirements Review ✅

Todo List:

  • Check current branch status and understand the branch difference
  • Read and analyze current agent.py file state
  • Review telemetry cleanup implementation in chat method
  • Review telemetry cleanup implementation in achat method
  • Identify any missing cleanup calls on exit paths
  • Implement fixes if issues exist
  • Test changes if possible
  • Commit and push fixes

Critical Issues Found and Fixed:

Analysis Complete - Multiple Bugs Identified and Fixed:

  1. Previous fix was incomplete - The branch claude/pr-1006-20250718-1636 tried to fix the telemetry cleanup by adding scattered _cleanup_telemetry() calls, but this approach was fundamentally flawed and missed several exit paths.

  2. Chat method was missing cleanup on 8+ exit paths - ✅ FIXED

  3. Achat method was missing cleanup on 6+ exit paths - ✅ FIXED

Solution Implemented:

✅ Comprehensive Fix Applied:

  • Added missing _cleanup_telemetry() calls to ALL exit paths in both chat and achat methods
  • Ensures telemetry cleanup happens on every method exit (success, failure, or exception)
  • Prevents resource leaks and hanging threads that were causing agent termination issues
  • Maintains 100% backward compatibility - no existing functionality changed
  • Minimal code footprint - only adds cleanup calls where missing

Key Exit Paths Fixed:

  • ✅ Chat method: JSON output success/failure, reflection success/failure, max reflections, response failures
  • ✅ Achat method: Custom LLM success/failure, OpenAI client errors, tool completion, JSON output, reflection not supported

Testing Results:

  • Code compiles successfully - no syntax errors
  • Agent class can be instantiated - no import errors
  • All critical exit paths now have cleanup calls - prevents hanging threads
  • No functionality changes - maintains existing behavior

Changes Deployed:

Commit: a4106675 - Fix telemetry cleanup on all exit paths in chat and achat methods
Branch: claude/pr-1006-20250718-1659
Files Modified: src/praisonai-agents/praisonaiagents/agent/agent.py (+13 lines)

Summary:
The telemetry cleanup issue has been comprehensively fixed. All missing _cleanup_telemetry() calls have been added to ensure proper resource cleanup regardless of how the methods exit. This prevents the hanging threads and resource leaks that were causing agent termination issues.


Job run: https://github.com/MervinPraison/PraisonAI/actions/runs/16376098915

github-actions bot added a commit that referenced this pull request Jul 18, 2025
- Add missing _cleanup_telemetry() calls to 8+ exit paths in chat method
- Add missing _cleanup_telemetry() calls to 6+ exit paths in achat method
- Prevents resource leaks and hanging threads during agent termination
- Ensures cleanup happens on ALL exit paths (success, failure, exception)
- Maintains 100% backward compatibility with existing functionality
- Resolves issues identified in PR #1006 review comments

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants