Agentic Data is an intelligent AI agent system using LangGraph to automatically generate SQL queries from Vietnamese natural language questions. The system is specifically designed to handle Vietnamese business data with advanced entity recognition and processing capabilities for Vietnamese entities like provinces, customers, and brands.
- π Semantic Search: Intelligent database schema search using vector search
- π»π³ Vietnamese Processing: Support for accent-insensitive matching and entity resolution
- π€ Multi-Agent Workflow: 6 specialized AI agents working together
- π SQL Generation: Automatic generation of optimized SQL queries for SQL Server
- π Iterative Improvement: System automatically improves SQL through feedback loops
- π Data Visualization: Built-in charting tools with Plotly integration
The system uses 6 specialized AI agents:
- π Search Engineer: Finds relevant database schemas
- π·οΈ Entity Resolver: Processes and normalizes Vietnamese entities
- π Query Planner: Plans and creates constraints for SQL
- βοΈ SQL Writer: Generates optimized SQL queries
- β QA Engineer: Validates SQL quality
- π¨βπΌ Chief DBA: Optimizes performance and provides recommendations
- Python 3.13+
- SQL Server (with ODBC Driver 18)
- Qdrant Vector Database
- Azure OpenAI API
git clone https://github.com/Khavanw/agentic-data.git
cd agentic-dataπ See the official guide: uv installation docs
pip install uvuv inituv syncCreate .env file from .env.example:
cp .env.example .envUpdate environment variables in .env:
# Azure OpenAI
AZURE_OPENAI_API_KEY=your_api_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-01
AZURE_DEPLOYMENT_NAME=gpt-4.1
# Qdrant
QDRANT_URL=https://your-cluster.qdrant.vector
QDRANT_API_KEY=your_qdrant_key
QDRANT_COLLECTION_NAME=data_assistant
# SQL Server
SQL_SERVER=your-sql-server.database.windows.net
SQL_DATABASE=your_database
SQL_USERNAME=your_username
SQL_PASSWORD=your_password
# Embedding
AZURE_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-small
AZURE_EMBEDDING_ENDPOIND=https://your-resource.openai.azure.com/
AZURE_EMBEDDING_API_VERSION=2024-02-01
# Logging
LOG_LEVEL=INFO# Create schema collection in Qdrant
python -m app.core.ingest_data.sql_schema# Development
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
# Production
uvicorn app.main:app --host 0.0.0.0 --port 8000GET /v1/rag/healthResponse:
{
"status": "OK"
}POST /v1/rag/queryRequest Body:
{
"request_id": "unique-request-id",
"query": "Show revenue by province in Q1 2024",
"session_id": "optional-session-id"
}Response:
{
"request_id": "unique-request-id",
"response_id": "generated-response-id",
"results": {
"database": "your_database",
"sql": "SELECT PROVINCE_NAME, SUM(REVENUE) FROM sales WHERE QUARTER = 1 AND YEAR = 2024 GROUP BY PROVINCE_NAME",
"value": [
{"PROVINCE_NAME": "Ho Chi Minh", "REVENUE": 1500000000},
{"PROVINCE_NAME": "Hanoi", "REVENUE": 1200000000}
]
},
"session_id": "session-id"
}import requests
import json
# Initialize session
session_id = None
# First query
query1 = {
"request_id": "req-001",
"query": "Total revenue by month in 2024",
"session_id": session_id
}
response1 = requests.post(
"http://localhost:8000/v1/rag/query",
json=query1
)
result1 = response1.json()
session_id = result1["session_id"]
# Follow-up query in same session
query2 = {
"request_id": "req-002",
"query": "Compare with 2023",
"session_id": session_id
}
response2 = requests.post(
"http://localhost:8000/v1/rag/query",
json=query2
)AI agents are configured in app/core/agents/agents.json:
{
"search_engineer": {
"system": "You are an Assistant Search Engineer...",
"expected_output": "Return a strict JSON object..."
},
"entity_resolver": {
"system": "You are an Entity & Alias Resolver for Vietnamese business data...",
"expected_output": "Return a strict JSON object..."
}
// ... other agents
}The system includes built-in charting tools:
- Table: Display data in table format
- Bar Chart: Column charts (grouped/stacked)
- Line Chart: Line graphs
- Scatter Plot: Scatter plots
- Heatmap: Correlation heatmaps
agentic-data/
βββ app/
β βββ core/ # Core business logic
β β βββ agents/ # AI agents configuration
β β βββ configs/ # Settings and configuration
β β βββ ingest_data/ # Data ingestion utilities
β β βββ models/ # ML models (embeddings)
β β βββ pipeline/ # LangGraph workflow
β β βββ tools/ # Agent tools and utilities
β βββ rag/ # RAG API endpoints
β βββ utils/ # Helper functions
β βββ main.py # FastAPI application
βββ docs/ # Documentation
βββ research/ # Jupyter notebooks
βββ tests/ # Unit tests
pytest tests/black app/
isort app/mypy app/- SQL injection protection
- API authentication
- Connection encryption
- Input validation
See Security Guidelines for more details.
- Fork the project
- Create feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add some AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open Pull Request
This project is distributed under the MIT License. See LICENSE for more information.
- π§ Email: vankha.comtact@gmail.com
- π Issues: GitHub Issues
- π Documentation: Wiki
- LangChain - Framework for LLM applications
- LangGraph - Multi-agent workflow orchestration
- FastAPI - Modern web framework
- Qdrant - Vector database
- Plotly - Data visualization
Made with β€οΈ for Vietnamese Data Analytics