This project provides a web-based UI built with Chainlit to serve as a testbed for the OpenAI Agents SDK. It allows you to interactively test agents, view their detailed execution stream, including tool calls and handoffs.
-
Clone the repository and navigate to the project directory.
-
Install Dependencies: This project uses
uv
for package management. Ensure it's installed (pip install uv
), then run:cd chat_interface uv sync
-
Set Up Environment Variables: Copy the
.env_example
to a new.env
file and add yourGEMINI_API_KEY
.cp .env_example .env # Now, edit the .env file with your key
To start the Chainlit interface, run the following command from the chat_interface
directory:
uv run python -m chainlit run main.py -w --port 8000
Now, open your browser to http://localhost:8000
to start experimenting with the agent.
Excellent! I can see you've added proper type annotations and imports to make the code more robust. Here are comprehensive test questions to validate all the capabilities of your OpenAI Agents SDK testbed:
"Hello! Can you introduce yourself?"
Expected: Direct response from Assistant without any handoffs or tool usage.
"What is 15 * 8 + 42?"
Expected: Handoff to Coder Agent → calculate tool call → result display.
"Please read test.txt for me"
Expected: Handoff to Coder Agent → read_file tool call → file content display.
"What's the weather like in San Francisco?"
Expected: Handoff to Weather Forecaster → get_weather tool call → weather info.
"Calculate both (10 + 5) * 3 and 100 / 4 for me"
Expected: Multiple tool calls to calculate, or agent breaking down the request.
"Can you read the file 'nonexistent.txt'?"
Expected: Tool call with error response showing file not found.
"Compare the weather in Lahore and San Francisco"
Expected: Multiple weather tool calls or agent explaining the differences.
"What is the result of (25 * 4) + (100 / 5) - 30?"
Expected: Single calculate tool call with complex expression.
"Calculate 10 + + 5"
Expected: Tool call with error handling showing syntax error.
"Calculate 50 * 2 and also tell me the weather in Lahore"
Expected: Multiple handoffs between Coder Agent and Weather Forecaster.
First: "What is 10 * 10?"
Then: "Now add 50 to that result"
Expected: Context preservation across messages.
"Can you help me with some numbers?"
Expected: Assistant asking for clarification without unnecessary handoffs.
"Explain how you work and what tools you have available"
Expected: Streaming text response, no tool calls, good explanation of capabilities.
Test sending multiple messages quickly to validate session handling.
✅ Handoff Messages: "Handoff to: [Agent Name]" appears
✅ Tool Steps: Tool name, input, and output clearly displayed
✅ Streaming: Text appears token by token
✅ Error Handling: Graceful error messages for invalid inputs
✅ Context: Conversation history maintained across requests
✅ UI Responsiveness: No freezing or hanging
Use it to create agents interactively :D