Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Proposal Draft for AI Agent for API Testing & Tool Generation #629

Merged
merged 2 commits into from
Mar 3, 2025

Conversation

akshayw1
Copy link
Contributor

@akshayw1 akshayw1 commented Mar 2, 2025

PR Description

This PR adds an initial proposal draft for the AI Agent for API Testing & Tool Generation (#620). The project aims to automate API testing using Large Language Models (LLMs), enabling intelligent test case generation, response validation, and seamless integration with AI agent frameworks like crewAI, smolagents, and pydantic-ai.

Related Issues

Feedback

Any insights or suggestions on the architecture, integration strategy, or additional features would be greatly appreciated to refine the proposal further. Looking forward to feedback

@akshayw1
Copy link
Contributor Author

akshayw1 commented Mar 2, 2025

@ashitaprasad, looking for feedback and answers for clarification.

@ashitaprasad
Copy link
Member

Sure @akshayw1

6. **Tool Definition Generator**: This component converts API specifications into properly structured tool definitions for various AI frameworks, handling the specific requirements and patterns of each target framework.

7. **Benchmark Framework**: The evaluation system that assesses LLM performance on standardized tasks with detailed metrics for accuracy, coverage, relevance, and efficiency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design is for a standalone service and currently it is not aligned with API Dash.
You will also have to think of UI/UX for this feature not just backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these I have UI UX in my mind, Will provide figma for that soon


7. **Benchmark Framework**: The evaluation system that assesses LLM performance on standardized tasks with detailed metrics for accuracy, coverage, relevance, and efficiency.

All components will be implemented in Python with comprehensive test coverage and documentation. The architecture will be modular, allowing for component reuse and independent scaling as needs evolve.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API Dash is a Flutter project.
Everything has to be implemented in Flutter. LLM interactions will happen via Ollama (local)/ChatGPT/Claude APIs.


I have some questions for more understanding:

1. Which AI frameworks are highest priority for tool definition generation? Is there a specific order of importance for crewAI, langchain, pydantic-ai, and langgraph?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tool definition generation is not a complex task. Once it is done for a framework, it will be easy to replicate for others.


1. Which AI frameworks are highest priority for tool definition generation? Is there a specific order of importance for crewAI, langchain, pydantic-ai, and langgraph?

2. Do you have preferred LLM providers that should be prioritized for integration, or should the system be designed to work with any provider through a common interface?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ollama API (local)
ChatGPT, Anthropic, Gemini APIs (if the users wants to connect to some API providers)


2. Do you have preferred LLM providers that should be prioritized for integration, or should the system be designed to work with any provider through a common interface?

3. Are there specific types of APIs that should be given special focus in the benchmark dataset (e.g., e-commerce, financial, IoT)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The benchmark must be a good mix.


3. Are there specific types of APIs that should be given special focus in the benchmark dataset (e.g., e-commerce, financial, IoT)?

4. How will the frontend be planned? Will it be a standalone interface, an extension of an existing dashboard, or fully integrated into an API testing - API Dash client ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be fully integrated to API Dash.

@ashitaprasad ashitaprasad merged commit 1618b6c into foss42:main Mar 3, 2025
@akshayw1 akshayw1 deleted the idea-draft-gsoc branch March 3, 2025 07:57
@akshayw1
Copy link
Contributor Author

akshayw1 commented Mar 3, 2025

@ashitaprasad Based on your feedback provided for proposal

AI Agent for API Testing and Automated Tool Integration Idea (Refined Proposal)

Changes and Refinements

Architecture and Implementation

  • Primary Implementation: Completely in Flutter for full integration with API Dash
  • LLM Integration:
    • Primary LLM Providers:
      1. Ollama (Local)
      2. ChatGPT
      3. Anthropic Claude
      4. Google Gemini
  • Modular Design: Maintain a flexible, extensible architecture that allows easy integration of additional LLM providers

System Architecture Updates

Architectural Considerations

  1. Flutter Approach

    • Entire solution will be implemented as a Flutter module
    • Seamless integration with existing API Dash infrastructure
    • Leverage Flutter's cross-platform capabilities for consistent UI/UX
  2. LLM Provider Abstraction

    • Create a provider-agnostic interface for LLM interactions
    • Support multiple providers with a unified integration approach
    • Allow easy switching between local (Ollama) and cloud-based LLM services
  3. Tool Definition Generation

    • Design a flexible template system that can be easily adapted across different AI frameworks
    • Initial focus on creating a generic template that can be quickly customized for:
      • crewAI
      • langchain
      • pydantic-ai
      • langgraph

Benchmark Framework Refinements

  • Create a diverse benchmark dataset covering multiple API types
  • Include a mix of API domains:
    • REST APIs
    • GraphQL APIs
    • gRPC APIs
    • Microservice APIs
  • Evaluation metrics:
    • Test case generation accuracy
    • API coverage
    • Edge case detection
    • Performance efficiency

UI/UX Considerations

  1. Integration with API Dash

    • Fully integrated Flutter module
    • Consistent design language with existing API Dash UI
    • Seamless user experience for API testing and tool generation
  2. Key UI Features

    • API specification upload interface
    • LLM provider configuration
    • Test generation and execution dashboard
    • Tool definition export functionality
    • Detailed reporting and visualization

Please review these changes so I can update them in my idea document. Let me know if I'm missing anything.

@ashitaprasad
Copy link
Member

@akshayw1 any updates you make have to be sent as a PR

@akshayw1
Copy link
Contributor Author

akshayw1 commented Mar 3, 2025

Okay sure @ashitaprasad

@akshayw1 akshayw1 restored the idea-draft-gsoc branch March 3, 2025 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants