Talk to My Data

Talk to My Data delivers a seamless talk-to-your-data experience, transforming files, spreadsheets, and cloud data into actionable insights. Simply upload data, connect to Snowflake or BigQuery, or access datasets from DataRobot's Data Registry. Then, ask a question, and the agent recommends business analyses, generating charts, tables, and even code to help you interpret the results.

This intuitive experience is designed for scalability and flexibility, ensuring that whether you're working with a few thousand rows or billions, your data analysis remains fast, efficient, and insightful.

Warning

Application templates are intended to be starting points that provide guidance on how to develop, serve, and maintain AI applications. They require a developer or data scientist to adapt and modify them for their business requirements before being put into production.

Quick Start

Please check out this Talk To My Data walkthrough.

User's Guide

The basic usage of the app is straightforward. The user uploads one or more structured files to the application, starts a chat and asks questions about those files. Behind the scenes, the LLM configured for the application translates the user's question into code, the application runs the code and again sends the results to an LLM to generate analysis and visualizations. Because the dataset is loaded into the application itself, this limits the size of the data that can be analyzed. The application can support larger datasets and connect to remote data stores through the DataRobot platform, described below.

Connecting to Data Registry in the DataRobot Platform

Large datasets can be uploaded to the DataRobot platform (see this documentation). Items from the user's data registry can be connected to the application (see screenshot below). These items are not downloaded into the application, but instead analysis is performed through DataRobot's data wrangling platform. This performs efficient queries that can support larger datasets (we have validated at least 5GB). Note there is a several minute cold start with the platform as it creates the analysis environment and loads data. These datasets can also be added locally, which will avoid this cold start, and is suitable for smaller files.

Connecting to Data Stores in the DataRobot Platform

When a user of the application is a DataRobot user (see this documentation for sharing applications) and has data stores configured in the DataRobot platform (see this page for configuring data stores and this page for details on supported data stores) of a supported connection (currently Postgres and Redshift), these will appear in the application as a "Remote Data Connection" (see screenshot below). These DataStores will be queried via DataRobot's data wrangling platform (see documentation). Unlike the app's bespoke database integration (see Change the database), a data store will not be visible to all users of the app, only to those who have access to the data store and its default credentials in the DataRobot platform.

Setup

Before proceeding, ensure you have access to the required credentials and services. This template is pre-configured to use an Azure OpenAI endpoint and Snowflake Database credentials. To run the template as-is, you will need access to Azure OpenAI (leverages gpt-4o by default).

Prerequisites: If you are using DataRobot Codespaces, this is already complete for you. If not, install:

Python 3.10+
uv (Python package manager)
Taskfile.dev (task runner)
Node.js 18+ (for React frontend)
Pulumi (infrastructure as code)

DataRobot Codespaces users: If you opened this template from the Application Templates gallery, you can skip steps 1 and 2. If you created a fresh codespace, you can skip step 1 but still need to clone the repository (step 2).

For local development, follow all of the following steps:

If pulumi is not already installed, install the CLI following instructions here. After installing for the first time, restart your terminal and run:
```
pulumi login --local  # omit --local to use Pulumi Cloud (requires separate account)
```

Clone the template repository

git clone https://github.com/datarobot-community/talk-to-my-data-agent.git
cd talk-to-my-data-agent

Rename the file .env.template to .env in the root directory of the repo and populate your credentials.

DATAROBOT_API_TOKEN=...
DATAROBOT_ENDPOINT=...  # e.g. https://app.datarobot.com/api/v2
OPENAI_API_KEY=...
OPENAI_API_VERSION=...  # e.g. 2024-02-01
OPENAI_API_BASE=...  # e.g. https://your_org.openai.azure.com/
OPENAI_API_DEPLOYMENT_ID=...  # e.g. gpt-4o
PULUMI_CONFIG_PASSPHRASE=...  # Required. Choose your own alphanumeric passphrase to be used for encrypting pulumi config
FRONTEND_TYPE=...  # Optional. Default is "react", set to "streamlit" to use Streamlit frontend
USE_DATAROBOT_LLM_GATEWAY=...  # Optional. Set to "true" to use DataRobot LLM Gateway with consumption based pricing instead of using your own LLM credentials

Use the following resources to locate the required credentials:

DataRobot API Token: Refer to the Create a DataRobot API Key section of the DataRobot API Quickstart docs.
DataRobot Endpoint: Refer to the Retrieve the API Endpoint section of the same DataRobot API Quickstart docs.
LLM Endpoint and API Key: Refer to the Azure OpenAI documentation.

In a terminal, run:
```
python quickstart.py YOUR_PROJECT_NAME  # Windows users may have to use `py` instead of `python`
```
What does quickstart.py do?

The quickstart script automates the entire setup process for you:
- Creates and activates a Python virtual environment
- Installs all required dependencies (using uv for faster installation, falling back to pip)
- Loads your .env configuration
- Sets up the Pulumi stack with your project name
- Runs pulumi up to deploy your application
- Displays your application URL when complete
This single command replaces all the manual steps described in the advanced setup section.

Python 3.10 - 3.12 are supported

Advanced users desiring control over virtual environment creation, dependency installation, environment variable setup and pulumi invocation see here.

Template development

The Talk to My Data agent supports two frontend options:

React (default): a modern JavaScript-based frontend with enhanced UI features which uses FastAPI Backend. See the React Frontend Development Guide
Streamlit: A Python-based frontend with a simple interface. See the Streamlit Frontend Development Guide

To change the frontend:

In .env: Set FRONTEND_TYPE="streamlit" to use the Streamlit frontend instead of the default React.
Run the following to update your stack (Or run python quickstart.py for easier setup)
```
source set_env.sh  # On Windows use `set_env.bat`
pulumi up
```

Architecture overview

App templates contain three families of complementary logic:

AI logic: Necessary to service AI requests and produce predictions and completions.
```
deployment_*/  # Chat agent model
```

App Logic: Necessary for user consumption; whether via a hosted front-end or integrating into an external consumption layer.

frontend/  # Streamlit frontend
app_frontend/  # React frontend alternative with the api located in app_backend
utils/  # App business logic & runtime helpers

Operational Logic: Necessary to activate DataRobot assets.

infra/__main__.py  # Pulumi program for configuring DataRobot to serve and monitor AI and app logic
infra/  # Settings for resources and assets created in DataRobot

Why build AI Apps with DataRobot app templates?

App Templates transform your AI projects from notebooks to production-ready applications. Too often, getting models into production means rewriting code, juggling credentials, and coordinating with multiple tools and teams just to make simple changes. DataRobot's composable AI apps framework eliminates these bottlenecks, letting you spend more time experimenting with your ML and app logic and less time wrestling with plumbing and deployment.

Start building in minutes: Deploy complete AI applications instantly, then customize the AI logic or the front-end independently (no architectural rewrites needed).
Keep working your way: Data scientists keep working in notebooks, developers in IDEs, and configs stay isolated. Update any piece without breaking others.
Iterate with confidence: Make changes locally and deploy with confidence. Spend less time writing and troubleshooting plumbing and more time improving your app.

Each template provides an end-to-end AI architecture, from raw inputs to deployed application, while remaining highly customizable for specific business requirements.

Data privacy

Your data privacy is important to us. Data handling is governed by the DataRobot Privacy Policy, please review before using your own data with DataRobot.

Make changes

Change the LLM

Modify the LLM setting in infra/settings_generative.py by changing LLM=LLMs.AZURE_OPENAI_GPT_4_O to any other LLM from the LLMs object.
- Trial users: Please set LLM=LLMs.AZURE_OPENAI_GPT_4_O_MINI since GPT-4o is not supported in the trial. Use the OPENAI_API_DEPLOYMENT_ID in .env to override which model is used in your Azure organization. You'll still see GPT 4o-mini in the playground, but the deployed app will use the provided Azure deployment.
To use an existing TextGen model or deployment:
- In infra/settings_generative.py: Set LLM=LLMs.DEPLOYED_LLM.
- In .env: Set either the TEXTGEN_REGISTERED_MODEL_ID or the TEXTGEN_DEPLOYMENT_ID
- In .env: Set CHAT_MODEL_NAME to the model name expected by the deployment (e.g. "claude-3-7-sonnet-20250219" for an anthropic deployment,"datarobot-deployed-llm" for NIM models )
- (Optional) In utils/api.py: ALTERNATIVE_LLM_BIG and ALTERNATIVE_LLM_SMALL can be used for fine-grained control over which LLM is used for different tasks.

Use DataRobot LLM Gateway

The application supports using the DataRobot LLM Gateway instead of bringing your own LLM credentials.

Credential Priority

The application follows this priority order for LLM selection:

OpenAI Credentials (Highest Priority) - If OPENAI_API_KEY, OPENAI_API_BASE, etc. are provided in .env, they will always be used regardless of the USE_DATAROBOT_LLM_GATEWAY setting
LLM Gateway - If USE_DATAROBOT_LLM_GATEWAY=true and no OpenAI credentials are provided

Setup

Important: Remove or comment out OPENAI_* environment variables to use DataRobot's LLM Gateway

In .env: Set USE_DATAROBOT_LLM_GATEWAY=true
Run pulumi up to update your stack (Or run python quickstart.py for easier setup)
```
source set_env.sh  # On Windows use `set_env.bat`
pulumi up
```

When LLM Gateway is enabled:

No hardcoded LLM credentials (OpenAI keys) are required in your .env file
The LLM Gateway provides a unified interface to multiple LLM providers through DataRobot in production
You can pick from the catalog and change the model LLM in infra/settings_generative.py
It will use a DataRobot Guarded RAG Deployment and LLM Blueprint for that selected model

Note: LLM Gateway mode requires consumption based pricing is enabled for your DataRobot account as is evidenced by the ENABLE_LLM_GATEWAY feature flag. Contact your administrator if this feature is not available.

In .env: If not using an existing TextGen model or deployment, provide the required credentials dependent on your choice.
Run pulumi up to update your stack (Or run python quickstart.py for easier setup)
```
source set_env.sh  # On Windows use `set_env.bat`
pulumi up
```

⚠️ Availability information: Using a NIM model requires custom model GPU inference, a premium feature. You will experience errors by using this type of model without the feature enabled. Contact your DataRobot representative or administrator for information on enabling this feature.

Change the database

Snowflake

To add Snowflake support:

Modify the DATABASE_CONNECTION_TYPE setting in infra/settings_database.py by changing DATABASE_CONNECTION_TYPE = "no_database" to DATABASE_CONNECTION_TYPE = "snowflake".
Provide snowflake credentials in .env by either setting SNOWFLAKE_USER and SNOWFLAKE_PASSWORD or by setting SNOWFLAKE_KEY_PATH to a file containing the key. The key file should be a *.p8 private key file. (see Snowflake Documentation)
Fill out the remaining snowflake connection settings in .env (refer to .env.template for more details)
Run pulumi up to update your stack (Or run python quickstart.py for easier setup)
```
source set_env.sh  # On Windows use `set_env.bat`
pulumi up
```

BigQuery

The Talk to my Data Agent supports connecting to BigQuery.

Modify the DATABASE_CONNECTION_TYPE setting in infra/settings_database.py by changing DATABASE_CONNECTION_TYPE = "no_database" to DATABASE_CONNECTION_TYPE = "bigquery".
Provide the required google credentials in .env dependent on your choice. Ensure that GOOGLE_DB_SCHEMA is also populated in .env.
Run pulumi up to update your stack (Or run python quickstart.py for easier setup)
```
source set_env.sh  # On Windows use `set_env.bat`
pulumi up
```

SAP Datasphere

The Talk to my Data Agent supports connecting to SAP Datasphere.

Modify the DATABASE_CONNECTION_TYPE setting in infra/settings_database.py by changing DATABASE_CONNECTION_TYPE = "no_database" to DATABASE_CONNECTION_TYPE = "sap".
Provide the required SAP credentials in .env.
Run pulumi up to update your stack (Or run python quickstart.py for easier setup)
```
source set_env.sh  # On Windows use `set_env.bat`
pulumi up
```

Tools

You can help the data analyst python agent by providing tools that can assist with data analysis tasks. For that, define functions in utils/tools.py. The function will be made available inside the code execution environment of the agent. The name, docstring and signature will be provided to the agent inside the prompt.

Share results

Log into the DataRobot application.
Navigate to Registry > Applications.
Navigate to the application you want to share, open the actions menu, and select Share from the dropdown.

Delete all provisioned resources

pulumi down

Setup for advanced users

For manual control over the setup process adapt the following steps for MacOS/Linux to your environment:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
source set_env.sh
pulumi stack init YOUR_PROJECT_NAME
pulumi up

e.g. for Windows/conda/cmd.exe this would be:

conda create --prefix .venv pip
conda activate .\.venv
pip install -r requirements.txt
set_env.bat
pulumi stack init YOUR_PROJECT_NAME
pulumi up

For projects that will be maintained, DataRobot recommends forking the repo so upstream fixes and improvements can be merged in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github		.github
_docs/images		_docs/images
app_backend		app_backend
app_frontend		app_frontend
frontend		frontend
infra		infra
notebooks		notebooks
utils		utils
.env.template		.env.template
.gitattributes		.gitattributes
.gitignore		.gitignore
.hadolint.yml		.hadolint.yml
.shellcheckrc		.shellcheckrc
.yamlfmt.yml		.yamlfmt.yml
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
Pulumi.yaml		Pulumi.yaml
README.md		README.md
Set-Env.ps1		Set-Env.ps1
pyproject.toml		pyproject.toml
quickstart.py		quickstart.py
requirements.txt		requirements.txt
set_env.bat		set_env.bat
set_env.sh		set_env.sh
test_pydantic.py		test_pydantic.py
trivy-ignore.rego		trivy-ignore.rego

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Talk to My Data

Table of contents

Quick Start

User's Guide

Connecting to Data Registry in the DataRobot Platform

Connecting to Data Stores in the DataRobot Platform

Setup

Template development

Architecture overview

Why build AI Apps with DataRobot app templates?

Data privacy

Make changes

Change the LLM

Use DataRobot LLM Gateway

Credential Priority

Setup

When LLM Gateway is enabled:

Change the database

Snowflake

BigQuery

SAP Datasphere

Tools

Share results

Delete all provisioned resources

Setup for advanced users

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

datarobot-community/talk-to-my-data-agent

Folders and files

Latest commit

History

Repository files navigation

Talk to My Data

Table of contents

Quick Start

User's Guide

Connecting to Data Registry in the DataRobot Platform

Connecting to Data Stores in the DataRobot Platform

Setup

Template development

Architecture overview

Why build AI Apps with DataRobot app templates?

Data privacy

Make changes

Change the LLM

Use DataRobot LLM Gateway

Credential Priority

Setup

When LLM Gateway is enabled:

Change the database

Snowflake

BigQuery

SAP Datasphere

Tools

Share results

Delete all provisioned resources

Setup for advanced users

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages