Skip to content

An implementation of a tool-using AI assistant for precise calculations. This project, part of a technical assessment, equips an LLM with a calculator to ensure mathematical accuracy.

Notifications You must be signed in to change notification settings

PryskaS/calculator-ai-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Assistant with Tools - Technical Challenge

1. Overview

This project presents the implementation of an AI assistant built in Python with the LangChain framework. The main objective is to demonstrate the ability of an intelligent agent to make autonomous decisions about when to use its internal knowledge versus when to trigger an external tool. To achieve this, the assistant is equipped with a calculator tool and utilizes a ReAct (Reasoning and Acting) agent to analyze the user's query and determine the best approach: either answering general knowledge questions directly or invoking the calculator to accurately solve arithmetic operations.


2. Tech Stack

  • Language: Python 3.x
  • Core Framework: LangChain
  • Language Model (LLM): OpenAI GPT-3.5-Turbo
  • Dependency Management: pip & requirements.txt
  • API Key Management: python-dotenv

3. How to Run

  1. Clone the Repository:

    git clone https://github.com/PryskaS/artefact-ai-assistant-challenge.git
    cd artefact-ai-assistant-challenge
  2. Set up a Virtual Environment (Recommended):

    # On Windows
    python -m venv venv
    .\venv\Scripts\activate
    
    # On macOS/Linux
    python3 -m venv venv
    source venv/bin/activate
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Configure the API Key:

    • Create a file named .env in the root directory of the project.
    • Inside this file, add your OpenAI API key:
      OPENAI_API_KEY="sk-..."
      
  5. Run the Assistant:

    python main.py

4. Implementation Logic

The agent's logic is built upon three core components: the agent architecture, the tool definition, and the decision-making mechanism.

4.1. Agent Architecture: ReAct (Reasoning and Acting)

The system leverages a ReAct (Reasoning and Acting) agent. This architecture was chosen for its transparent, step-by-step reasoning process, which is ideal for tasks involving tool use.

  • Core Loop: The agent operates on a "Thought → Action → Observation" cycle.
  • Traceability: The verbose=True flag exposes this entire reasoning chain in the terminal, making the agent's behavior easy to debug and understand. The agent effectively "shows its work" before producing a final answer.

4.2. Tool Definition: Sandboxed Calculator

A single Calculator tool was implemented to handle arithmetic operations. Key implementation details include:

  • Secure Evaluation: The tool uses Python's eval() function to compute mathematical results. To mitigate security risks, eval() is executed in a sandboxed environment with __builtins__ disabled, preventing the execution of arbitrary code.
  • Error Handling: The function is wrapped in a try...except block. This ensures that invalid mathematical expressions or operations (like division by zero) do not crash the agent. Instead, a descriptive error string is returned, which the agent receives as an "Observation" to inform its next step.

4.3. Decision-Making Mechanism

The agent's decision to use the Calculator is not hard-coded. Instead, the decision is delegated to the LLM, governed by the quality of the tool's description.

  • Prompt-Driven Decision: The core of the routing logic lies in the description string provided to the Tool object:

    description="Use this tool to evaluate simple arithmetic expressions. The input must be a valid mathematical expression string."
  • Mechanism: When presented with a user query, the LLM determines if the query matches the tool's described capability. For a query like "What is 128 * 46?", the LLM identifies a direct match and formats a call to the Calculator. For knowledge-based questions, it infers a mismatch and proceeds to formulate a Final Answer directly.


5. Learnings & Next Steps

5.1. Key Learnings

  • Open-Source Model Integration: Initial integration attempts with open-source models (Mistral-7B, Zephyr-7B) highlighted significant API compatibility issues within the LangChain framework, specifically concerning the task parameter (text-generation vs. conversational). This underscores the practical challenges of working with a rapidly evolving open-source LLM ecosystem.
  • Strategic Pivot as a Pragmatic Solution: Faced with integration blockers, a pivot to the OpenAI API (gpt-3.5-turbo) was a pragmatic decision to ensure project stability and focus on the core agent logic. The OpenAI integration with the ReAct agent proved to be robust and functional out-of-the-box. This journey is a key takeaway on the trade-offs between using cutting-edge open-source models and relying on more mature, stable APIs.
  • Tool description is Paramount: The agent's routing accuracy is almost entirely dependent on the tool's description string. It functions as a micro-prompt that is critical for the LLM's ability to select the correct tool. Small changes in the description's wording can significantly alter the agent's behavior.

5.2. Next Steps

The current implementation is a solid proof-of-concept. The most logical path for evolution would be:

  • Stateful Agent: Refactor the agent to include memory (e.g., ConversationBufferMemory) to support multi-turn, contextual conversations. The current implementation is stateless, treating each query independently.

  • Tool Expansion & Routing:

    • With multiple tools, the basic ReAct agent can become inefficient. A potential improvement would be to implement a Router Agent, a preliminary LLM call that efficiently selects the most appropriate tool(s) before invoking the main agent chain.
    • A concrete example of a new tool that could be added is one for providing the current date and time.
      • Python Function:
        from datetime import datetime
        
        def get_current_datetime() -> str:
            """Returns the current date and time in a readable format."""
            # Note: This will use the server's time. For specific timezones, `pytz` would be needed.
            return datetime.now().strftime("%A, %B %d, %Y %I:%M %p")
      • Tool Definition:
        date_tool = Tool(
            name="Current Date and Time",
            func=get_current_datetime,
            description="Use this to get the current date and time. Use it for any questions about today's date or the current time."
        )
  • UI Implementation: Develop a simple web interface using Streamlit or Gradio to replace the current command-line interface, making the assistant more interactive and accessible.

About

An implementation of a tool-using AI assistant for precise calculations. This project, part of a technical assessment, equips an LLM with a calculator to ensure mathematical accuracy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages