Skip to content

Khavanw/agentic-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Agentic Data - AI-Powered SQL Query Generator

Python FastAPI LangGraph License

Agentic Data is an intelligent AI agent system using LangGraph to automatically generate SQL queries from Vietnamese natural language questions. The system is specifically designed to handle Vietnamese business data with advanced entity recognition and processing capabilities for Vietnamese entities like provinces, customers, and brands.

✨ Key Features

  • πŸ” Semantic Search: Intelligent database schema search using vector search
  • πŸ‡»πŸ‡³ Vietnamese Processing: Support for accent-insensitive matching and entity resolution
  • πŸ€– Multi-Agent Workflow: 6 specialized AI agents working together
  • πŸ“Š SQL Generation: Automatic generation of optimized SQL queries for SQL Server
  • πŸ”„ Iterative Improvement: System automatically improves SQL through feedback loops
  • πŸ“ˆ Data Visualization: Built-in charting tools with Plotly integration

πŸ—οΈ System Architecture

alt text

πŸ€– AI Agents Workflow

The system uses 6 specialized AI agents:

  1. πŸ” Search Engineer: Finds relevant database schemas
  2. 🏷️ Entity Resolver: Processes and normalizes Vietnamese entities
  3. πŸ“‹ Query Planner: Plans and creates constraints for SQL
  4. ✍️ SQL Writer: Generates optimized SQL queries
  5. βœ… QA Engineer: Validates SQL quality
  6. πŸ‘¨β€πŸ’Ό Chief DBA: Optimizes performance and provides recommendations

πŸš€ Installation

System Requirements

  • Python 3.13+
  • SQL Server (with ODBC Driver 18)
  • Qdrant Vector Database
  • Azure OpenAI API

1. Clone Repository

git clone https://github.com/Khavanw/agentic-data.git
cd agentic-data

2. Create Virtual Environment

πŸ“¦ How to Install uv

πŸ‘‰ See the official guide: uv installation docs

pip install uv

Initialize uv

uv init

3. Install Dependencies

uv sync

4. Configure Environment

Create .env file from .env.example:

cp .env.example .env

Update environment variables in .env:

# Azure OpenAI
AZURE_OPENAI_API_KEY=your_api_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-01
AZURE_DEPLOYMENT_NAME=gpt-4.1

# Qdrant
QDRANT_URL=https://your-cluster.qdrant.vector
QDRANT_API_KEY=your_qdrant_key
QDRANT_COLLECTION_NAME=data_assistant

# SQL Server
SQL_SERVER=your-sql-server.database.windows.net
SQL_DATABASE=your_database
SQL_USERNAME=your_username
SQL_PASSWORD=your_password

# Embedding
AZURE_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-small
AZURE_EMBEDDING_ENDPOIND=https://your-resource.openai.azure.com/
AZURE_EMBEDDING_API_VERSION=2024-02-01

# Logging
LOG_LEVEL=INFO

5. Initialize Database

# Create schema collection in Qdrant
python -m app.core.ingest_data.sql_schema

6. Run Application

# Development
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Production
uvicorn app.main:app --host 0.0.0.0 --port 8000

πŸ“– Usage

API Endpoints

1. Health Check

GET /v1/rag/health

Response:

{
  "status": "OK"
}

2. Query Processing

POST /v1/rag/query

Request Body:

{
  "request_id": "unique-request-id",
  "query": "Show revenue by province in Q1 2024",
  "session_id": "optional-session-id"
}

Response:

{
  "request_id": "unique-request-id",
  "response_id": "generated-response-id",
  "results": {
    "database": "your_database",
    "sql": "SELECT PROVINCE_NAME, SUM(REVENUE) FROM sales WHERE QUARTER = 1 AND YEAR = 2024 GROUP BY PROVINCE_NAME",
    "value": [
      {"PROVINCE_NAME": "Ho Chi Minh", "REVENUE": 1500000000},
      {"PROVINCE_NAME": "Hanoi", "REVENUE": 1200000000}
    ]
  },
  "session_id": "session-id"
}

Python Client Example

import requests
import json

# Initialize session
session_id = None

# First query
query1 = {
    "request_id": "req-001",
    "query": "Total revenue by month in 2024",
    "session_id": session_id
}

response1 = requests.post(
    "http://localhost:8000/v1/rag/query",
    json=query1
)

result1 = response1.json()
session_id = result1["session_id"]

# Follow-up query in same session
query2 = {
    "request_id": "req-002", 
    "query": "Compare with 2023",
    "session_id": session_id
}

response2 = requests.post(
    "http://localhost:8000/v1/rag/query",
    json=query2
)

πŸ”§ Agent Configuration

AI agents are configured in app/core/agents/agents.json:

{
  "search_engineer": {
    "system": "You are an Assistant Search Engineer...",
    "expected_output": "Return a strict JSON object..."
  },
  "entity_resolver": {
    "system": "You are an Entity & Alias Resolver for Vietnamese business data...",
    "expected_output": "Return a strict JSON object..."
  }
  // ... other agents
}

πŸ“Š Data Visualization

The system includes built-in charting tools:

  • Table: Display data in table format
  • Bar Chart: Column charts (grouped/stacked)
  • Line Chart: Line graphs
  • Scatter Plot: Scatter plots
  • Heatmap: Correlation heatmaps

πŸ› οΈ Development

Project Structure

agentic-data/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ core/                 # Core business logic
β”‚   β”‚   β”œβ”€β”€ agents/          # AI agents configuration
β”‚   β”‚   β”œβ”€β”€ configs/         # Settings and configuration
β”‚   β”‚   β”œβ”€β”€ ingest_data/     # Data ingestion utilities
β”‚   β”‚   β”œβ”€β”€ models/          # ML models (embeddings)
β”‚   β”‚   β”œβ”€β”€ pipeline/        # LangGraph workflow
β”‚   β”‚   └── tools/           # Agent tools and utilities
β”‚   β”œβ”€β”€ rag/                 # RAG API endpoints
β”‚   β”œβ”€β”€ utils/               # Helper functions
β”‚   └── main.py              # FastAPI application
β”œβ”€β”€ docs/                    # Documentation
β”œβ”€β”€ research/                # Jupyter notebooks
└── tests/                   # Unit tests

Running Tests

pytest tests/

Code Formatting

black app/
isort app/

Type Checking

mypy app/

πŸ”’ Security

⚠️ Important Note: The current project has some security issues that need to be addressed before production deployment:

  • SQL injection protection
  • API authentication
  • Connection encryption
  • Input validation

See Security Guidelines for more details.

🀝 Contributing

  1. Fork the project
  2. Create feature branch (git checkout -b feature/AmazingFeature)
  3. Commit changes (git commit -m 'Add some AmazingFeature')
  4. Push to branch (git push origin feature/AmazingFeature)
  5. Open Pull Request

πŸ“ License

This project is distributed under the MIT License. See LICENSE for more information.

πŸ“ž Support

πŸ™ Acknowledgments


Made with ❀️ for Vietnamese Data Analytics

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages