CData Connect AI - LlamaIndex Agent

Build AI-powered data assistants using LlamaIndex and CData Connect AI.

This package provides a simple interface for creating conversational AI agents that can query and analyze data from 350+ data sources connected through CData Connect AI's Model Context Protocol (MCP) server.

Features

Natural Language Data Access: Query databases, SaaS applications, and files using plain English
350+ Data Sources: Connect to Salesforce, Google Sheets, Snowflake, PostgreSQL, and more
LlamaIndex Integration: Built on LlamaIndex's ReAct agent framework
Multiple LLM Support: Works with OpenAI (GPT-4) and Anthropic (Claude) models
Streaming Responses: Real-time token streaming for interactive applications
Multi-turn Conversations: Maintains context across conversation turns
Low-level Access: Direct MCP client for programmatic tool usage

Installation

pip install connectai-llamaindex-agent

Or install from source:

git clone https://github.com/CDataSoftware/connectai-llamaindex-agent.git
cd connectai-llamaindex-agent
pip install -e .

Quick Start

1. Set Up Sample Data (Optional)

To follow along with the examples, you can use our sample Google Sheet:

Open the sample customer health spreadsheet
Click File > Make a copy to save it to your Google Drive
Name it "demo_organization" (or any name you prefer)
Connect it to CData Connect AI as a Google Sheets data source

2. Set Up Credentials

Create a .env file with your credentials:

# CData Connect AI credentials (required)
CDATA_EMAIL=your_email@example.com
CDATA_PAT=your_personal_access_token

# OpenAI credentials (required for OpenAI)
OPENAI_API_KEY=your_openai_api_key

# Or use Anthropic instead
# LLM_PROVIDER=anthropic
# ANTHROPIC_API_KEY=your_anthropic_api_key

Get your CData credentials:

Log in to CData Connect AI
Go to Settings > Access Tokens
Create a new Personal Access Token

3. Basic Usage

from dotenv import load_dotenv
from connectai_llamaindex import MCPAgent, Config

load_dotenv()

config = Config.from_env()

with MCPAgent(config) as agent:
    # Ask about available data
    response = agent.chat("What data sources are available?")
    print(response)

    # Query your data
    response = agent.chat("Show me the first 10 rows from the account table")
    print(response)

4. Interactive Chat

Run the interactive chat example to explore your data conversationally:

python examples/basic_chat.py

Example session:

============================================================
CData Connect AI - LlamaIndex Chat Assistant
============================================================

Connected! Available tools: getCatalogs, getSchemas, getTables, getColumns, queryData, getProcedures, getProcedureParameters, executeProcedure, getInstructions

You can now ask questions about your connected data sources.
Type 'quit' to exit, 'clear' to reset history, 'tools' to list tools.

You: What data sources do I have?

Try these example queries:

"What data sources do I have connected?"
"Show me all the tables in demo_organization"
"What columns are in the account table?"
"Query the top 5 accounts by annual_revenue"
"How many support tickets are there by priority?"

Available MCP Tools

The agent automatically has access to these tools for exploring and querying your data:

Tool	Description
`getCatalogs`	List all connected data sources (returns catalog name, data source, and driver)
`getSchemas`	Get schemas within a catalog
`getTables`	List tables within a schema
`getColumns`	Get column metadata for a table
`queryData`	Execute SQL queries
`getProcedures`	List stored procedures
`getProcedureParameters`	Get procedure parameters
`executeProcedure`	Execute stored procedures
`getInstructions`	Get driver-specific guidance (use driver name from getCatalogs)

Configuration Options

Environment Variables

Variable	Required	Default	Description
`CDATA_EMAIL`	Yes	-	Your CData Connect AI email
`CDATA_PAT`	Yes	-	Your CData Personal Access Token
`LLM_PROVIDER`	No	`openai`	LLM provider (`openai` or `anthropic`)
`OPENAI_API_KEY`	If OpenAI	-	OpenAI API key
`OPENAI_MODEL`	No	`gpt-4o`	OpenAI model to use
`ANTHROPIC_API_KEY`	If Anthropic	-	Anthropic API key
`ANTHROPIC_MODEL`	No	`claude-sonnet-4-20250514`	Anthropic model to use
`MCP_SERVER_URL`	No	`https://mcp.cloud.cdata.com/mcp`	MCP server URL

Programmatic Configuration

from connectai_llamaindex import MCPAgent, Config

config = Config(
    cdata_email="your_email@example.com",
    cdata_pat="your_pat",
    llm_provider="openai",
    openai_api_key="your_openai_key",
    openai_model="gpt-4o",
)

agent = MCPAgent(
    config,
    system_prompt="You are a helpful data analyst...",
    max_iterations=15,
    verbose=True,
)

Examples

Streaming Responses

with MCPAgent(config) as agent:
    tokens = agent.stream_chat("Analyze my sales data")
    for token in tokens:
        print(token, end="", flush=True)

Low-Level MCP Client

from connectai_llamaindex import MCPClient, Config

config = Config.from_env()

with MCPClient(config) as client:
    # List available tools
    tools = client.list_tools()

    # Get catalogs
    catalogs = client.get_catalogs()

    # Execute a query
    results = client.query_data(
        "SELECT * FROM [demo_organization].[GoogleSheets].[account] LIMIT 10"
    )

Custom System Prompt

custom_prompt = """You are a financial analyst assistant.
When analyzing data:
1. Always calculate key financial metrics
2. Identify trends and anomalies
3. Provide actionable insights
"""

agent = MCPAgent(config, system_prompt=custom_prompt)

SQL Query Format

When querying data, use fully qualified table names:

SELECT * FROM [CatalogName].[SchemaName].[TableName] LIMIT 10

For example:

SELECT * FROM [demo_organization].[GoogleSheets].[account] LIMIT 10
SELECT [name], [annual_revenue] FROM [demo_organization].[GoogleSheets].[account] WHERE [annual_revenue] > 1000000

Use getCatalogs to discover available catalog names, then getSchemas and getTables to explore the structure.

Project Structure

connectai-llamaindex-agent/
├── src/
│   └── connectai_llamaindex/
│       ├── __init__.py      # Package exports
│       ├── agent.py         # LlamaIndex ReAct agent
│       ├── client.py        # MCP client implementation
│       └── config.py        # Configuration management
├── examples/
│   ├── basic_chat.py        # Interactive chat example
│   ├── streaming_chat.py    # Streaming responses
│   ├── programmatic_usage.py # Programmatic API usage
│   ├── query_google_sheets.py
│   └── multi_source_query.py
├── tests/
├── pyproject.toml
└── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
src/connectai_llamaindex		src/connectai_llamaindex
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CData Connect AI - LlamaIndex Agent

Features

Installation

Quick Start

1. Set Up Sample Data (Optional)

2. Set Up Credentials

3. Basic Usage

4. Interactive Chat

Available MCP Tools

Configuration Options

Environment Variables

Programmatic Configuration

Examples

Streaming Responses

Low-Level MCP Client

Custom System Prompt

SQL Query Format

Project Structure

Related Projects

License

Support

About

Uh oh!

Releases

Packages

Languages

License

CDataSoftware/connectai-llamaindex-agent

Folders and files

Latest commit

History

Repository files navigation

CData Connect AI - LlamaIndex Agent

Features

Installation

Quick Start

1. Set Up Sample Data (Optional)

2. Set Up Credentials

3. Basic Usage

4. Interactive Chat

Available MCP Tools

Configuration Options

Environment Variables

Programmatic Configuration

Examples

Streaming Responses

Low-Level MCP Client

Custom System Prompt

SQL Query Format

Project Structure

Related Projects

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages