Skip to content

Build AI-powered data assistants with LlamaIndex and CData Connect AI

License

Notifications You must be signed in to change notification settings

CDataSoftware/connectai-llamaindex-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CData Connect AI - LlamaIndex Agent

Build AI-powered data assistants using LlamaIndex and CData Connect AI.

This package provides a simple interface for creating conversational AI agents that can query and analyze data from 350+ data sources connected through CData Connect AI's Model Context Protocol (MCP) server.

Features

  • Natural Language Data Access: Query databases, SaaS applications, and files using plain English
  • 350+ Data Sources: Connect to Salesforce, Google Sheets, Snowflake, PostgreSQL, and more
  • LlamaIndex Integration: Built on LlamaIndex's ReAct agent framework
  • Multiple LLM Support: Works with OpenAI (GPT-4) and Anthropic (Claude) models
  • Streaming Responses: Real-time token streaming for interactive applications
  • Multi-turn Conversations: Maintains context across conversation turns
  • Low-level Access: Direct MCP client for programmatic tool usage

Installation

pip install connectai-llamaindex-agent

Or install from source:

git clone https://github.com/CDataSoftware/connectai-llamaindex-agent.git
cd connectai-llamaindex-agent
pip install -e .

Quick Start

1. Set Up Sample Data (Optional)

To follow along with the examples, you can use our sample Google Sheet:

  1. Open the sample customer health spreadsheet
  2. Click File > Make a copy to save it to your Google Drive
  3. Name it "demo_organization" (or any name you prefer)
  4. Connect it to CData Connect AI as a Google Sheets data source

2. Set Up Credentials

Create a .env file with your credentials:

# CData Connect AI credentials (required)
CDATA_EMAIL=your_email@example.com
CDATA_PAT=your_personal_access_token

# OpenAI credentials (required for OpenAI)
OPENAI_API_KEY=your_openai_api_key

# Or use Anthropic instead
# LLM_PROVIDER=anthropic
# ANTHROPIC_API_KEY=your_anthropic_api_key

Get your CData credentials:

  1. Log in to CData Connect AI
  2. Go to Settings > Access Tokens
  3. Create a new Personal Access Token

3. Basic Usage

from dotenv import load_dotenv
from connectai_llamaindex import MCPAgent, Config

load_dotenv()

config = Config.from_env()

with MCPAgent(config) as agent:
    # Ask about available data
    response = agent.chat("What data sources are available?")
    print(response)

    # Query your data
    response = agent.chat("Show me the first 10 rows from the account table")
    print(response)

4. Interactive Chat

Run the interactive chat example to explore your data conversationally:

python examples/basic_chat.py

Example session:

============================================================
CData Connect AI - LlamaIndex Chat Assistant
============================================================

Connected! Available tools: getCatalogs, getSchemas, getTables, getColumns, queryData, getProcedures, getProcedureParameters, executeProcedure, getInstructions

You can now ask questions about your connected data sources.
Type 'quit' to exit, 'clear' to reset history, 'tools' to list tools.

You: What data sources do I have?

Try these example queries:

  • "What data sources do I have connected?"
  • "Show me all the tables in demo_organization"
  • "What columns are in the account table?"
  • "Query the top 5 accounts by annual_revenue"
  • "How many support tickets are there by priority?"

Available MCP Tools

The agent automatically has access to these tools for exploring and querying your data:

Tool Description
getCatalogs List all connected data sources (returns catalog name, data source, and driver)
getSchemas Get schemas within a catalog
getTables List tables within a schema
getColumns Get column metadata for a table
queryData Execute SQL queries
getProcedures List stored procedures
getProcedureParameters Get procedure parameters
executeProcedure Execute stored procedures
getInstructions Get driver-specific guidance (use driver name from getCatalogs)

Configuration Options

Environment Variables

Variable Required Default Description
CDATA_EMAIL Yes - Your CData Connect AI email
CDATA_PAT Yes - Your CData Personal Access Token
LLM_PROVIDER No openai LLM provider (openai or anthropic)
OPENAI_API_KEY If OpenAI - OpenAI API key
OPENAI_MODEL No gpt-4o OpenAI model to use
ANTHROPIC_API_KEY If Anthropic - Anthropic API key
ANTHROPIC_MODEL No claude-sonnet-4-20250514 Anthropic model to use
MCP_SERVER_URL No https://mcp.cloud.cdata.com/mcp MCP server URL

Programmatic Configuration

from connectai_llamaindex import MCPAgent, Config

config = Config(
    cdata_email="your_email@example.com",
    cdata_pat="your_pat",
    llm_provider="openai",
    openai_api_key="your_openai_key",
    openai_model="gpt-4o",
)

agent = MCPAgent(
    config,
    system_prompt="You are a helpful data analyst...",
    max_iterations=15,
    verbose=True,
)

Examples

Streaming Responses

with MCPAgent(config) as agent:
    tokens = agent.stream_chat("Analyze my sales data")
    for token in tokens:
        print(token, end="", flush=True)

Low-Level MCP Client

from connectai_llamaindex import MCPClient, Config

config = Config.from_env()

with MCPClient(config) as client:
    # List available tools
    tools = client.list_tools()

    # Get catalogs
    catalogs = client.get_catalogs()

    # Execute a query
    results = client.query_data(
        "SELECT * FROM [demo_organization].[GoogleSheets].[account] LIMIT 10"
    )

Custom System Prompt

custom_prompt = """You are a financial analyst assistant.
When analyzing data:
1. Always calculate key financial metrics
2. Identify trends and anomalies
3. Provide actionable insights
"""

agent = MCPAgent(config, system_prompt=custom_prompt)

SQL Query Format

When querying data, use fully qualified table names:

SELECT * FROM [CatalogName].[SchemaName].[TableName] LIMIT 10

For example:

SELECT * FROM [demo_organization].[GoogleSheets].[account] LIMIT 10
SELECT [name], [annual_revenue] FROM [demo_organization].[GoogleSheets].[account] WHERE [annual_revenue] > 1000000

Use getCatalogs to discover available catalog names, then getSchemas and getTables to explore the structure.

Project Structure

connectai-llamaindex-agent/
├── src/
│   └── connectai_llamaindex/
│       ├── __init__.py      # Package exports
│       ├── agent.py         # LlamaIndex ReAct agent
│       ├── client.py        # MCP client implementation
│       └── config.py        # Configuration management
├── examples/
│   ├── basic_chat.py        # Interactive chat example
│   ├── streaming_chat.py    # Streaming responses
│   ├── programmatic_usage.py # Programmatic API usage
│   ├── query_google_sheets.py
│   └── multi_source_query.py
├── tests/
├── pyproject.toml
└── README.md

Related Projects

License

MIT License - see LICENSE for details.

Support

About

Build AI-powered data assistants with LlamaIndex and CData Connect AI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages