Build AI-powered data assistants using LlamaIndex and CData Connect AI.
This package provides a simple interface for creating conversational AI agents that can query and analyze data from 350+ data sources connected through CData Connect AI's Model Context Protocol (MCP) server.
- Natural Language Data Access: Query databases, SaaS applications, and files using plain English
- 350+ Data Sources: Connect to Salesforce, Google Sheets, Snowflake, PostgreSQL, and more
- LlamaIndex Integration: Built on LlamaIndex's ReAct agent framework
- Multiple LLM Support: Works with OpenAI (GPT-4) and Anthropic (Claude) models
- Streaming Responses: Real-time token streaming for interactive applications
- Multi-turn Conversations: Maintains context across conversation turns
- Low-level Access: Direct MCP client for programmatic tool usage
pip install connectai-llamaindex-agentOr install from source:
git clone https://github.com/CDataSoftware/connectai-llamaindex-agent.git
cd connectai-llamaindex-agent
pip install -e .To follow along with the examples, you can use our sample Google Sheet:
- Open the sample customer health spreadsheet
- Click File > Make a copy to save it to your Google Drive
- Name it "demo_organization" (or any name you prefer)
- Connect it to CData Connect AI as a Google Sheets data source
Create a .env file with your credentials:
# CData Connect AI credentials (required)
CDATA_EMAIL=your_email@example.com
CDATA_PAT=your_personal_access_token
# OpenAI credentials (required for OpenAI)
OPENAI_API_KEY=your_openai_api_key
# Or use Anthropic instead
# LLM_PROVIDER=anthropic
# ANTHROPIC_API_KEY=your_anthropic_api_keyGet your CData credentials:
- Log in to CData Connect AI
- Go to Settings > Access Tokens
- Create a new Personal Access Token
from dotenv import load_dotenv
from connectai_llamaindex import MCPAgent, Config
load_dotenv()
config = Config.from_env()
with MCPAgent(config) as agent:
# Ask about available data
response = agent.chat("What data sources are available?")
print(response)
# Query your data
response = agent.chat("Show me the first 10 rows from the account table")
print(response)Run the interactive chat example to explore your data conversationally:
python examples/basic_chat.pyExample session:
============================================================
CData Connect AI - LlamaIndex Chat Assistant
============================================================
Connected! Available tools: getCatalogs, getSchemas, getTables, getColumns, queryData, getProcedures, getProcedureParameters, executeProcedure, getInstructions
You can now ask questions about your connected data sources.
Type 'quit' to exit, 'clear' to reset history, 'tools' to list tools.
You: What data sources do I have?
Try these example queries:
- "What data sources do I have connected?"
- "Show me all the tables in demo_organization"
- "What columns are in the account table?"
- "Query the top 5 accounts by annual_revenue"
- "How many support tickets are there by priority?"
The agent automatically has access to these tools for exploring and querying your data:
| Tool | Description |
|---|---|
getCatalogs |
List all connected data sources (returns catalog name, data source, and driver) |
getSchemas |
Get schemas within a catalog |
getTables |
List tables within a schema |
getColumns |
Get column metadata for a table |
queryData |
Execute SQL queries |
getProcedures |
List stored procedures |
getProcedureParameters |
Get procedure parameters |
executeProcedure |
Execute stored procedures |
getInstructions |
Get driver-specific guidance (use driver name from getCatalogs) |
| Variable | Required | Default | Description |
|---|---|---|---|
CDATA_EMAIL |
Yes | - | Your CData Connect AI email |
CDATA_PAT |
Yes | - | Your CData Personal Access Token |
LLM_PROVIDER |
No | openai |
LLM provider (openai or anthropic) |
OPENAI_API_KEY |
If OpenAI | - | OpenAI API key |
OPENAI_MODEL |
No | gpt-4o |
OpenAI model to use |
ANTHROPIC_API_KEY |
If Anthropic | - | Anthropic API key |
ANTHROPIC_MODEL |
No | claude-sonnet-4-20250514 |
Anthropic model to use |
MCP_SERVER_URL |
No | https://mcp.cloud.cdata.com/mcp |
MCP server URL |
from connectai_llamaindex import MCPAgent, Config
config = Config(
cdata_email="your_email@example.com",
cdata_pat="your_pat",
llm_provider="openai",
openai_api_key="your_openai_key",
openai_model="gpt-4o",
)
agent = MCPAgent(
config,
system_prompt="You are a helpful data analyst...",
max_iterations=15,
verbose=True,
)with MCPAgent(config) as agent:
tokens = agent.stream_chat("Analyze my sales data")
for token in tokens:
print(token, end="", flush=True)from connectai_llamaindex import MCPClient, Config
config = Config.from_env()
with MCPClient(config) as client:
# List available tools
tools = client.list_tools()
# Get catalogs
catalogs = client.get_catalogs()
# Execute a query
results = client.query_data(
"SELECT * FROM [demo_organization].[GoogleSheets].[account] LIMIT 10"
)custom_prompt = """You are a financial analyst assistant.
When analyzing data:
1. Always calculate key financial metrics
2. Identify trends and anomalies
3. Provide actionable insights
"""
agent = MCPAgent(config, system_prompt=custom_prompt)When querying data, use fully qualified table names:
SELECT * FROM [CatalogName].[SchemaName].[TableName] LIMIT 10For example:
SELECT * FROM [demo_organization].[GoogleSheets].[account] LIMIT 10
SELECT [name], [annual_revenue] FROM [demo_organization].[GoogleSheets].[account] WHERE [annual_revenue] > 1000000Use getCatalogs to discover available catalog names, then getSchemas and getTables to explore the structure.
connectai-llamaindex-agent/
├── src/
│ └── connectai_llamaindex/
│ ├── __init__.py # Package exports
│ ├── agent.py # LlamaIndex ReAct agent
│ ├── client.py # MCP client implementation
│ └── config.py # Configuration management
├── examples/
│ ├── basic_chat.py # Interactive chat example
│ ├── streaming_chat.py # Streaming responses
│ ├── programmatic_usage.py # Programmatic API usage
│ ├── query_google_sheets.py
│ └── multi_source_query.py
├── tests/
├── pyproject.toml
└── README.md
- connectai-openai-agent - OpenAI-based agent
- connectai-claude-agent - Claude Agent SDK-based agent
MIT License - see LICENSE for details.