AI Assistant with Tools - Technical Challenge

1. Overview

This project presents the implementation of an AI assistant built in Python with the LangChain framework. The main objective is to demonstrate the ability of an intelligent agent to make autonomous decisions about when to use its internal knowledge versus when to trigger an external tool. To achieve this, the assistant is equipped with a calculator tool and utilizes a ReAct (Reasoning and Acting) agent to analyze the user's query and determine the best approach: either answering general knowledge questions directly or invoking the calculator to accurately solve arithmetic operations.

2. Tech Stack

Language: Python 3.x
Core Framework: LangChain
Language Model (LLM): OpenAI GPT-3.5-Turbo
Dependency Management: pip & requirements.txt
API Key Management: python-dotenv

3. How to Run

Clone the Repository:

git clone https://github.com/PryskaS/artefact-ai-assistant-challenge.git
cd artefact-ai-assistant-challenge

Set up a Virtual Environment (Recommended):

# On Windows
python -m venv venv
.\venv\Scripts\activate

# On macOS/Linux
python3 -m venv venv
source venv/bin/activate

Install Dependencies:
```
pip install -r requirements.txt
```
Configure the API Key:
- Create a file named .env in the root directory of the project.
- Inside this file, add your OpenAI API key:
```
OPENAI_API_KEY="sk-..."
```
Run the Assistant:
```
python main.py
```

4. Implementation Logic

The agent's logic is built upon three core components: the agent architecture, the tool definition, and the decision-making mechanism.

4.1. Agent Architecture: ReAct (Reasoning and Acting)

The system leverages a ReAct (Reasoning and Acting) agent. This architecture was chosen for its transparent, step-by-step reasoning process, which is ideal for tasks involving tool use.

Core Loop: The agent operates on a "Thought → Action → Observation" cycle.
Traceability: The verbose=True flag exposes this entire reasoning chain in the terminal, making the agent's behavior easy to debug and understand. The agent effectively "shows its work" before producing a final answer.

4.2. Tool Definition: Sandboxed Calculator

A single Calculator tool was implemented to handle arithmetic operations. Key implementation details include:

Secure Evaluation: The tool uses Python's eval() function to compute mathematical results. To mitigate security risks, eval() is executed in a sandboxed environment with __builtins__ disabled, preventing the execution of arbitrary code.
Error Handling: The function is wrapped in a try...except block. This ensures that invalid mathematical expressions or operations (like division by zero) do not crash the agent. Instead, a descriptive error string is returned, which the agent receives as an "Observation" to inform its next step.

4.3. Decision-Making Mechanism

The agent's decision to use the Calculator is not hard-coded. Instead, the decision is delegated to the LLM, governed by the quality of the tool's description.

Prompt-Driven Decision: The core of the routing logic lies in the description string provided to the Tool object:

description="Use this tool to evaluate simple arithmetic expressions. The input must be a valid mathematical expression string."

Mechanism: When presented with a user query, the LLM determines if the query matches the tool's described capability. For a query like "What is 128 * 46?", the LLM identifies a direct match and formats a call to the Calculator. For knowledge-based questions, it infers a mismatch and proceeds to formulate a Final Answer directly.

5. Learnings & Next Steps

5.1. Key Learnings

Open-Source Model Integration: Initial integration attempts with open-source models (Mistral-7B, Zephyr-7B) highlighted significant API compatibility issues within the LangChain framework, specifically concerning the task parameter (text-generation vs. conversational). This underscores the practical challenges of working with a rapidly evolving open-source LLM ecosystem.
Strategic Pivot as a Pragmatic Solution: Faced with integration blockers, a pivot to the OpenAI API (gpt-3.5-turbo) was a pragmatic decision to ensure project stability and focus on the core agent logic. The OpenAI integration with the ReAct agent proved to be robust and functional out-of-the-box. This journey is a key takeaway on the trade-offs between using cutting-edge open-source models and relying on more mature, stable APIs.
Tool description is Paramount: The agent's routing accuracy is almost entirely dependent on the tool's description string. It functions as a micro-prompt that is critical for the LLM's ability to select the correct tool. Small changes in the description's wording can significantly alter the agent's behavior.

5.2. Next Steps

The current implementation is a solid proof-of-concept. The most logical path for evolution would be:

Stateful Agent: Refactor the agent to include memory (e.g., ConversationBufferMemory) to support multi-turn, contextual conversations. The current implementation is stateless, treating each query independently.

Tool Expansion & Routing:

With multiple tools, the basic ReAct agent can become inefficient. A potential improvement would be to implement a Router Agent, a preliminary LLM call that efficiently selects the most appropriate tool(s) before invoking the main agent chain.

A concrete example of a new tool that could be added is one for providing the current date and time.

Python Function:

from datetime import datetime

def get_current_datetime() -> str:
    """Returns the current date and time in a readable format."""
    # Note: This will use the server's time. For specific timezones, `pytz` would be needed.
    return datetime.now().strftime("%A, %B %d, %Y %I:%M %p")

Tool Definition:

date_tool = Tool(
    name="Current Date and Time",
    func=get_current_datetime,
    description="Use this to get the current date and time. Use it for any questions about today's date or the current time."
)

UI Implementation: Develop a simple web interface using Streamlit or Gradio to replace the current command-line interface, making the assistant more interactive and accessible.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Assistant with Tools - Technical Challenge

1. Overview

2. Tech Stack

3. How to Run

4. Implementation Logic

4.1. Agent Architecture: ReAct (Reasoning and Acting)

4.2. Tool Definition: Sandboxed Calculator

4.3. Decision-Making Mechanism

5. Learnings & Next Steps

5.1. Key Learnings

5.2. Next Steps

About

Uh oh!

Releases

Packages

Languages

PryskaS/calculator-ai-assistant

Folders and files

Latest commit

History

Repository files navigation

AI Assistant with Tools - Technical Challenge

1. Overview

2. Tech Stack

3. How to Run

4. Implementation Logic

4.1. Agent Architecture: ReAct (Reasoning and Acting)

4.2. Tool Definition: Sandboxed Calculator

4.3. Decision-Making Mechanism

5. Learnings & Next Steps

5.1. Key Learnings

5.2. Next Steps

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages