Skip to content

This project provides a web-based UI built with Chainlit to serve as a testbed for the OpenAI Agents SDK.

Notifications You must be signed in to change notification settings

mjunaidca/agents_sdk_ui_testbed

Repository files navigation

OpenAI Agents SDK Testbed

This project provides a web-based UI built with Chainlit to serve as a testbed for the OpenAI Agents SDK. It allows you to interactively test agents, view their detailed execution stream, including tool calls and handoffs.

Setup

  1. Clone the repository and navigate to the project directory.

  2. Install Dependencies: This project uses uv for package management. Ensure it's installed (pip install uv), then run:

    cd chat_interface
    uv sync
  3. Set Up Environment Variables: Copy the .env_example to a new .env file and add your GEMINI_API_KEY.

    cp .env_example .env
    # Now, edit the .env file with your key

Run the Application

To start the Chainlit interface, run the following command from the chat_interface directory:

uv run python -m chainlit run main.py -w --port 8000

Now, open your browser to http://localhost:8000 to start experimenting with the agent.

Test Prompts with Sample Agent:

Excellent! I can see you've added proper type annotations and imports to make the code more robust. Here are comprehensive test questions to validate all the capabilities of your OpenAI Agents SDK testbed:

1. Simple Conversation (No tools, no handoffs)

"Hello! Can you introduce yourself?"

Expected: Direct response from Assistant without any handoffs or tool usage.

2. Calculator Tool (Handoff + Tool)

"What is 15 * 8 + 42?"

Expected: Handoff to Coder Agent → calculate tool call → result display.

3. File Reading Tool (Handoff + Tool)

"Please read test.txt for me"

Expected: Handoff to Coder Agent → read_file tool call → file content display.

4. Weather Tool (Handoff + Tool)

"What's the weather like in San Francisco?"

Expected: Handoff to Weather Forecaster → get_weather tool call → weather info.

Advanced Validation Tests

5. Multiple Calculations in One Request

"Calculate both (10 + 5) * 3 and 100 / 4 for me"

Expected: Multiple tool calls to calculate, or agent breaking down the request.

6. Non-existent File

"Can you read the file 'nonexistent.txt'?"

Expected: Tool call with error response showing file not found.

7. Weather for Different Cities

"Compare the weather in Lahore and San Francisco"

Expected: Multiple weather tool calls or agent explaining the differences.

8. Complex Mathematical Expression

"What is the result of (25 * 4) + (100 / 5) - 30?"

Expected: Single calculate tool call with complex expression.

Edge Cases & Error Handling

9. Invalid Mathematical Expression

"Calculate 10 + + 5"

Expected: Tool call with error handling showing syntax error.

10. Mixed Requests (Multiple agents needed)

"Calculate 50 * 2 and also tell me the weather in Lahore"

Expected: Multiple handoffs between Coder Agent and Weather Forecaster.

Conversation Flow Tests

11. Follow-up Questions

First: "What is 10 * 10?"
Then: "Now add 50 to that result"

Expected: Context preservation across messages.

12. Ambiguous Request

"Can you help me with some numbers?"

Expected: Assistant asking for clarification without unnecessary handoffs.

Streaming & UI Validation

13. Long Response Generation

"Explain how you work and what tools you have available"

Expected: Streaming text response, no tool calls, good explanation of capabilities.

14. Quick Successive Requests

Test sending multiple messages quickly to validate session handling.

What to Look For:

Handoff Messages: "Handoff to: [Agent Name]" appears
Tool Steps: Tool name, input, and output clearly displayed
Streaming: Text appears token by token
Error Handling: Graceful error messages for invalid inputs
Context: Conversation history maintained across requests
UI Responsiveness: No freezing or hanging

Use it to create agents interactively :D

About

This project provides a web-based UI built with Chainlit to serve as a testbed for the OpenAI Agents SDK.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published