feat: implement knowledge retrieval events in Agent #2727

lorenzejay · 2025-04-30T05:10:08Z

This commit introduces a series of knowledge retrieval events in the Agent class, enhancing its ability to handle knowledge queries. New events include KnowledgeRetrievalStartedEvent, KnowledgeRetrievalCompletedEvent, KnowledgeQueryGeneratedEvent, KnowledgeQueryFailedEvent, and KnowledgeSearchQueryCompletedEvent. The Agent now emits these events during knowledge retrieval processes, allowing for better tracking and handling of knowledge queries. Additionally, the console formatter has been updated to handle these new events, providing visual feedback during knowledge retrieval operations.

need todos:

docs
ensure logging happens during events
more tests when knowledge is not set

This commit introduces a series of knowledge retrieval events in the Agent class, enhancing its ability to handle knowledge queries. New events include KnowledgeRetrievalStartedEvent, KnowledgeRetrievalCompletedEvent, KnowledgeQueryGeneratedEvent, KnowledgeQueryFailedEvent, and KnowledgeSearchQueryCompletedEvent. The Agent now emits these events during knowledge retrieval processes, allowing for better tracking and handling of knowledge queries. Additionally, the console formatter has been updated to handle these new events, providing visual feedback during knowledge retrieval operations.

joaomdmoura · 2025-04-30T05:12:50Z

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment: Knowledge Retrieval Events Implementation

Overview

The changes introduced in this pull request enhance the knowledge retrieval event handling within the Agent class, allowing improved tracking of knowledge query operations through new event types and related functionality.

Positive Aspects

The implementation illustrates a well-organized structure, maintaining clear separation of concerns.
It successfully covers various stages of knowledge retrieval, including error handling for specific cases.
Documentation is clear, with beneficial type hints enhancing code readability.

Issues and Suggested Improvements

1. Code Duplication in Event Handling

Issue: The execute_task() method shows repeated error handling patterns.
Suggestion: Refactor this pattern into a reusable helper method to improve maintainability:

def _emit_knowledge_error(self, event_type, query: str, error: str):
    crewai_event_bus.emit(
        self,
        event=event_type(
            query=query,
            agent=self,
            error=str(error)
        )
    )

2. Complex Knowledge Query Logic

Issue: The current logic in execute_task() contains nested conditionals that hinder readability.
Suggestion: Extract this logic to a method dedicated solely to processing knowledge queries:

def _process_knowledge_query(self, task_prompt: str) -> str:
    if not self.knowledge:
        return task_prompt
        
    crewai_event_bus.emit(self, event=KnowledgeRetrievalStartedEvent(agent=self))
    try:
        search_query = self._get_knowledge_search_query(task_prompt)
        if not search_query:
            return task_prompt
            
        task_prompt = self._query_knowledge_sources(search_query, task_prompt)
        crewai_event_bus.emit(
            self,
            event=KnowledgeRetrievalCompletedEvent(
                query=search_query,
                agent=self
            )
        )
    except Exception as e:
        self._emit_knowledge_error(KnowledgeSearchQueryFailedEvent, 
                                 self.knowledge_search_query or "", 
                                 str(e))
    return task_prompt

3. Long Method in `_get_knowledge_search_query`

Issue: The method is overly long and contains repeated logic.
Suggestion: Modularize LLM calls by creating a dedicated method:

def _make_llm_query(self, system_prompt: str, query: str) -> str:
    if not isinstance(self.llm, BaseLLM):
        return None
        
    try:
        return self.llm.call([
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": query}
        ])
    except Exception as e:
        self._emit_knowledge_error(KnowledgeQueryFailedEvent, query, str(e))
        return None

4. Event Class Structure

Issue: Duplicate attributes exist across knowledge event classes.
Suggestion: Introduce a base event class to streamline attribute declarations:

class BaseKnowledgeEvent(BaseEvent):
    agent: BaseAgent
    type: str

class KnowledgeQueryEvent(BaseKnowledgeEvent):
    query: str
    
class KnowledgeErrorEvent(KnowledgeQueryEvent):
    error: str

5. Console Formatter Methods

Issue: Duplicate logic exists in console formatter methods for tree handling.
Suggestion: Create a single method to handle tree context retrieval:

def _get_tree_context(self, agent_branch, crew_tree):
    branch_to_use = self.current_lite_agent_branch or agent_branch
    tree_to_use = branch_to_use or crew_tree
    
    if branch_to_use is None or tree_to_use is None:
        return None, None
        
    return branch_to_use, tree_to_use

Security Considerations

Validate LLM responses to ensure safety before utilizing them in queries.
Consider implementing rate limiting for knowledge queries.
Add logging mechanisms for failed knowledge operations.
Validate knowledge source configurations to avoid errors.

Performance Recommendations

Implement caching for frequently executed knowledge query results.
Introduce timeout settings for knowledge operations to handle sluggish responses.
Explore batch processing for multiple knowledge queries to improve efficiency.
Consider metrics collection for monitoring knowledge retrieval performance.

Testing Recommendations

Ensure comprehensive unit tests for error cases in knowledge queries.
Enforce integration tests for the full knowledge retrieval workflow.
Include performance benchmarks for various knowledge operations.
Utilize mock tests to simulate LLM interactions during testing.

The proposed changes create an excellent foundation for enhanced knowledge retrieval event handling. Additional refactoring focusing on separation of concerns, robust error handling, and performance optimizations will further elevate the quality and maintainability of the codebase.

Let's continue to build on this strong foundation for better scalability and performance in handling knowledge retrieval processes.

mplachta · 2025-04-30T05:13:11Z

Disclaimer: This review was made by a crew of AI Agents.

Code Review for PR #2727: Implement Knowledge Retrieval Events in Agent

Summary of Key Changes

Introduces a rich set of knowledge retrieval lifecycle events (Started, Completed, QueryGenerated, QueryFailed, SearchQueryFailed) for the Agent class.
Augments the Agent's execute_task workflow to emit these events for improved observability and tracking of knowledge source queries.
Adds a dedicated method _get_knowledge_search_query that leverages the LLM to rewrite the user's task prompt into an optimized vector search query, with corresponding events emitted.
Updates event listener registration to handle the new knowledge events and pipe them into the console formatter.
Extends the console formatter with robust handlers displaying knowledge retrieval progress and errors in the CLI using clear symbols and colors.
Adds i18n prompt strings associated with knowledge search query rewriting.
Implements a new integration test with patched knowledge source and recorded HTTP interactions using VCR cassettes, validating query rewriting and knowledge-aware agent execution.

Detailed Findings & Suggestions

1. `src/crewai/agent.py`

Event-driven knowledge retrieval integration is well architected, clearly separating concerns by emitting events during distinct phases.
Exception handling on knowledge retrieval is too broad (except Exception) which risks masking critical errors.
- Suggestion: Narrow exception scope to target only knowledge backend or LLM-related exceptions, re-raise or log unexpected errors explicitly.
Nested conditionals and multiple fetches from self.knowledge and self.crew create complexity that hinders readability.
- Suggestion: Extract knowledge retrieval logic into a separate method like retrieve_and_apply_knowledge to flatten and clarify flow.
Method signature for _get_knowledge_search_query uses mixed typing styles (str | None) whereas other code prefers Optional[str].
- Suggestion: Use consistent type annotation style project-wide, preferably Optional[str] for compatibility.
Docstrings are present but large complex blocks (e.g., execute_task) would benefit from inline comments summarizing major logical steps.
Timeout handling using thread pool executor is clear and robust.
Overall, the enhancement significantly improves modularity and observability of knowledge querying in Agent task execution.

2. `src/crewai/translations/en.json`

New system and user prompts for knowledge query rewriting were added cleanly.
Minor formatting issue: file lacks trailing newline.
- Suggestion: Add newline at end of file to conform with POSIX and modern tooling conventions.

3. `src/crewai/utilities/events/event_listener.py`

Newly registered event handlers follow existing conventions and integrate knowledge events smoothly.
Observed some inconsistency in event handler parameter naming (source, event vs just event).
- Suggestion: Standardize to (source, event) for all handlers for predictability and easier maintenance.
Null safety: If self.formatter.current_agent_branch and current_crew_tree are None, formatter calls may fail silently or incorrectly.
- Suggestion: Add null checks and warnings if event context branches are missing to assist debugging.

4. `src/crewai/utilities/events/knowledge_events.py`

Event classes are concise and follow base event structure.
Duplicate imports of BaseAgent between normal and TYPE_CHECKING blocks can be cleaned up.
- Suggestion: Import BaseAgent only under if TYPE_CHECKING: since it is used solely for type annotations.
Recommend documenting field usage rationale across events for clarity (e.g., why some events have query vs task_prompt).

5. `src/crewai/utilities/events/utils/console_formatter.py`

Event handler implementations provide clear terminal feedback with consistent use of icons ("🔍", "✅", "❌", "🔎") and color coding.
Noticed repeated logic to determine the branch_to_use and tree_to_use across handlers leads to code duplication.
- Suggestion: Extract this to a _resolve_tree_and_branch(agent_branch, crew_tree) helper to DRY up code.
Query truncation for display employs naive slicing that could mislead or awkwardly truncate short queries.
- Suggestion: Improve truncation with bounds check:
```
display_query = (query[:47] + "...") if len(query) > 50 else query
```
Error handling gracefully shows error panels with descriptive messages, improving user experience.

6. `tests/agent_test.py`

Comprehensive new integration test covers knowledge query generation and retrieval with LLM mocking and VCR HTTP recording.
Test setup with patching KnowledgeStorage is verbose and repeated.
- Suggestion: Refactor patch setup into a pytest fixture to reduce boilerplate and clarify intent.
Assertions would benefit from custom failure messages to ease debugging on test failures.
- Example: assert "Brandon" in str(agent.knowledge_search_query), "Expected 'Brandon' in knowledge_search_query"
Usage of VCR ensures deterministic and isolated tests from external dependencies.

7. `tests/cassettes/test_agent_with_knowledge_sources_generate_search_query.yaml`

Auto-generated cassette appears thorough capturing all external HTTP calls needed for test reproducibility.
No issues detected with this file.

Historical Context and Learning from Related PRs

This event-driven approach aligns well with previous extensions in the event bus architecture.
Similar patterns for error event emission and console formatting used in prior LLM-related PRs can be observed here, demonstrating consistent design choices.
Usage of dedicated system and user prompts for LLM-driven query rewriting follow best practices established in text transformation subprocesses seen earlier.

Specific Improvement Suggestions with Examples

a) Narrow Exception Handling in `agent.py`

try:
    self.knowledge_search_query = self._get_knowledge_search_query(task_prompt)
    # knowledge query logic...
except KnowledgeQueryServiceError as e:
    crewai_event_bus.emit(self, KnowledgeSearchQueryFailedEvent(
        query=self.knowledge_search_query or "",
        agent=self,
        error=str(e),
    ))
except Exception as e:
    # Log and re-raise unexpected exceptions
    logger.error(f"Unexpected error in knowledge retrieval: {e}")
    raise

b) Refactor Tree Resolution in Console Formatter

def _resolve_tree_and_branch(self, agent_branch, crew_tree):
    branch_to_use = self.current_lite_agent_branch or agent_branch
    tree_to_use = branch_to_use or crew_tree
    if branch_to_use is None or tree_to_use is None:
        return None, None
    return branch_to_use, tree_to_use

c) Robust Query Display Truncation in Console Formatter

display_query = (query[:47] + "...") if len(query) > 50 else query
self.update_tree_label(query_branch, "🔎", f"Query: {display_query}", "yellow")

d) Better Assertion Messages in Tests

assert "Brandon" in str(agent.knowledge_search_query), "Expected 'Brandon' in generated knowledge_search_query"
assert "favorite_color" in str(agent.knowledge_search_query), "Expected 'favorite_color' key in generated query"

e) Add Missing Trailing Newline in `en.json`

Simply ensure the file ends with a newline character to comply with POSIX conventions and avoid tooling warnings.

Overall Evaluation

Code Quality: High-quality, well-structured additions leveraging event-driven architecture.
Readability: Good, but can be improved with modularization and clearer exception handling.
Maintainability: Positive direction with events; DRY and typing consistency improvements recommended.
Testing: Excellent coverage including integration with knowledge sources; can be refined for clarity and reusability.

This PR appreciably enhances the Agent’s knowledge retrieval mechanism by embedding a rich event lifecycle, increasing observability and control. Addressing the above improvement points, especially around exception scoping, code duplication, and test refinement, will strengthen long-term maintainability and robustness.

Great work on this substantial feature addition!

lucasgomide · 2025-04-30T17:23:00Z

src/crewai/agent.py

@@ -185,7 +197,7 @@ def execute_task(
        self,
        task: Task,
        context: Optional[str] = None,
-        tools: Optional[List[BaseTool]] = None
+        tools: Optional[List[BaseTool]] = None,


We definitely need some rough standard roles for that haha each contribution has its own ‘preferences’

lucasgomide reviewed Apr 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement knowledge retrieval events in Agent #2727

feat: implement knowledge retrieval events in Agent #2727

lorenzejay commented Apr 30, 2025 •

edited

Loading

joaomdmoura commented Apr 30, 2025

mplachta commented Apr 30, 2025

lucasgomide Apr 30, 2025

feat: implement knowledge retrieval events in Agent #2727

Are you sure you want to change the base?

feat: implement knowledge retrieval events in Agent #2727

Conversation

lorenzejay commented Apr 30, 2025 • edited Loading

joaomdmoura commented Apr 30, 2025

Code Review Comment: Knowledge Retrieval Events Implementation

Overview

Positive Aspects

Issues and Suggested Improvements

1. Code Duplication in Event Handling

2. Complex Knowledge Query Logic

3. Long Method in _get_knowledge_search_query

4. Event Class Structure

5. Console Formatter Methods

Security Considerations

Performance Recommendations

Testing Recommendations

mplachta commented Apr 30, 2025

Code Review for PR #2727: Implement Knowledge Retrieval Events in Agent

Summary of Key Changes

Detailed Findings & Suggestions

1. src/crewai/agent.py

2. src/crewai/translations/en.json

3. src/crewai/utilities/events/event_listener.py

4. src/crewai/utilities/events/knowledge_events.py

5. src/crewai/utilities/events/utils/console_formatter.py

6. tests/agent_test.py

7. tests/cassettes/test_agent_with_knowledge_sources_generate_search_query.yaml

Historical Context and Learning from Related PRs

Specific Improvement Suggestions with Examples

a) Narrow Exception Handling in agent.py

b) Refactor Tree Resolution in Console Formatter

c) Robust Query Display Truncation in Console Formatter

d) Better Assertion Messages in Tests

e) Add Missing Trailing Newline in en.json

Overall Evaluation

lucasgomide Apr 30, 2025

Choose a reason for hiding this comment

lorenzejay commented Apr 30, 2025 •

edited

Loading

3. Long Method in `_get_knowledge_search_query`

1. `src/crewai/agent.py`

2. `src/crewai/translations/en.json`

3. `src/crewai/utilities/events/event_listener.py`

4. `src/crewai/utilities/events/knowledge_events.py`

5. `src/crewai/utilities/events/utils/console_formatter.py`

6. `tests/agent_test.py`

7. `tests/cassettes/test_agent_with_knowledge_sources_generate_search_query.yaml`

a) Narrow Exception Handling in `agent.py`

e) Add Missing Trailing Newline in `en.json`