Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
b4016ef
updates to enable smarter data load
Chenglong-MS Sep 24, 2024
b663a50
Merge branch 'main' into dev
Chenglong-MS Sep 24, 2024
24ba6bd
experimental data cleaning on load function
Chenglong-MS Oct 3, 2024
ff6b585
wip
Chenglong-MS Oct 4, 2024
ea0c562
some fixes
Chenglong-MS Oct 7, 2024
5084e3a
supporting image uploads as inputs
Chenglong-MS Oct 8, 2024
d42a91e
preparing for pip release
Chenglong-MS Oct 9, 2024
2b74c7c
pip install from tar
danmarshall Oct 9, 2024
89af5c7
auto-launch
danmarshall Oct 9, 2024
3ab42e3
remove "f5"
danmarshall Oct 9, 2024
38c1a31
static image for codespace
danmarshall Oct 9, 2024
7b19a06
some clean up
Chenglong-MS Oct 9, 2024
4d9e7e3
update readme
Chenglong-MS Oct 9, 2024
5bb9c9e
cleaning up text
Chenglong-MS Oct 9, 2024
bdfda8a
merge diff
Chenglong-MS Oct 10, 2024
7d142f6
Fix code scanning alert no. 3: DOM text reinterpreted as HTML
Chenglong-MS Oct 10, 2024
5c0e286
Fix code scanning alert no. 6: DOM text reinterpreted as HTML
Chenglong-MS Oct 10, 2024
1ad4bcd
update to readme
Chenglong-MS Oct 10, 2024
a1dcfca
README change
Chenglong-MS Oct 10, 2024
d4d8d2f
add workflow
Chenglong-MS Oct 10, 2024
9e89425
update build
Chenglong-MS Oct 10, 2024
bf83d56
update build script
Chenglong-MS Oct 10, 2024
f85df75
update build script
Chenglong-MS Oct 10, 2024
0e052d8
update build script
Chenglong-MS Oct 10, 2024
c2fd657
fix typo in workflow
Chenglong-MS Oct 10, 2024
f45b5dd
try new install order
Chenglong-MS Oct 10, 2024
f8ce5a0
check
Chenglong-MS Oct 10, 2024
f9bc4a4
try include package information
Chenglong-MS Oct 10, 2024
52329d4
fix
Chenglong-MS Oct 10, 2024
cd55ce0
test again..
Chenglong-MS Oct 10, 2024
1c25fd9
try luck
Chenglong-MS Oct 10, 2024
b16a78a
try luck
Chenglong-MS Oct 10, 2024
363adcb
try luck
Chenglong-MS Oct 10, 2024
0a0d6b8
update pyproject
Chenglong-MS Oct 10, 2024
c729658
update manifest
Chenglong-MS Oct 10, 2024
d75c8dd
update build flow
Chenglong-MS Oct 10, 2024
7b27661
update upload script
Chenglong-MS Oct 11, 2024
fbb965a
try fix build
Chenglong-MS Oct 11, 2024
7fff58b
update publish scripts
Chenglong-MS Oct 11, 2024
2330fa5
prep
Chenglong-MS Oct 11, 2024
e623830
update readme
Chenglong-MS Oct 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
// "forwardPorts": [],

// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "python3 -m venv /workspaces/data-formulator/venv && . /workspaces/data-formulator/venv/bin/activate && pip install -r /workspaces/data-formulator/requirements.txt --verbose && yarn install && yarn build"
"postCreateCommand": "python3 -m venv /workspaces/data-formulator/venv && . /workspaces/data-formulator/venv/bin/activate && pip install https://github.com/user-attachments/files/17319752/data_formulator-0.1.0.tar.gz --verbose && data_formulator"

// Configure tool-specific properties.
// "customizations": {},
Expand Down
34 changes: 0 additions & 34 deletions .github/workflows/build.yml

This file was deleted.

62 changes: 62 additions & 0 deletions .github/workflows/python-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: build

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

jobs:
build:
runs-on: ubuntu-latest
strategy:
fail-fast: false
steps:
- uses: actions/checkout@v4
- name: Set Node.js 20
uses: actions/setup-node@v4
with:
node-version: 20
cache: 'yarn'
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install node dependencies
run: yarn install
- name: Install python dependencies
run: |
python -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
python -m pip install build
- name: Build frontend
run: yarn build
- name: Build python artifact
run: python -m build
- name: Archive production artifacts
uses: actions/upload-artifact@v4
with:
name: release-dist
path: dist

pypi-publish:
runs-on: ubuntu-latest
needs:
- build
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags') # only publish when push with tag
environment:
name: pypi
url: https://pypi.org/p/data-formulator
permissions:
id-token: write
steps:
- name: Retrieve release distributions
uses: actions/download-artifact@v4
with:
name: release-dist
path: dist/
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@


*openai-keys.env
**/*.ipynb_checkpoints/

.DS_Store
Expand Down
3 changes: 1 addition & 2 deletions CODESPACES.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,11 @@ You will need a GitHub account and to be logged in to use Codespaces.
### Step 2: Run the app
The codespace is a VSCode development environment in the cloud. Once the Codespace is created, start Data Formuator with the following steps:

* Press **F5** to run. Or if you prefer, click the **Run and Debug** tab on the left, and the **Start Debugging** button.
* A toast about port forwarding will appear, click the **Open in Browser** button.
* You will see the Data Formulator app!

<kbd>
<img width="528" alt="image" src="https://github.com/user-attachments/assets/e62bebda-8daf-4587-94d4-fede48de382b">
<img width="528" alt="image" src="https://github.com/user-attachments/assets/cb9e2123-4a42-4926-8b59-5bafb9be25fa">
</kbd>


Expand Down
54 changes: 42 additions & 12 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ How to set up your local machine.
## Backend (Python)

- **Create a Virtual Environment**
```bash
python -m venv venv
.\venv\Scripts\activate
```
```bash
python -m venv venv
.\venv\Scripts\activate
```

- **Install Dependencies**
```bash
pip install -r requirements.txt
```
```bash
pip install -r requirements.txt
```

- **Run**
- **Windows**
Expand All @@ -33,9 +33,10 @@ pip install -r requirements.txt
## Frontend (TypeScript)

- **Install NPM packages**
```bash
yarn
```

```bash
yarn
```

- **Development mode**

Expand All @@ -46,14 +47,43 @@ yarn
Open [http://localhost:3000](http://localhost:3000) to view it in the browser.
The page will reload if you make edits. You will also see any lint errors in the console.

- **Build for Production**
## Build for Production

- **Build the frontend and then the backend**

Compile the TypeScript files and bundle the project:
```bash
yarn build
```
This builds the app for production to the `dist` folder.
This builds the app for production to the `py-src/data_formulator/dist` folder.

Then, build python package:

```bash
pip install build
python -m build
```
This will create a python wheel in the `dist/` folder. The name would be `data_formulator-<version>-py3-none-any.whl`

- **Test the artifact**

You can then install the build result wheel (testing in a virtual environment is recommended):
```bash
# replace <version> with the actual build version.
pip install dist/data_formulator-<version>-py3-none-any.whl
```

Once installed, you can run Data Formulator with:
```bash
data_formulator
```
or
```bash
python -m data_formulator
```

Open [http://localhost:5000](http://localhost:5000) to view it in the browser.


## Usage
See the [Usage section on the README.md page](README.md#usage).
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
include py-src/data_formulator/dist/*
include py-src/data_formulator/dist/assets/*
70 changes: 48 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,75 +6,101 @@

[![arxiv](https://img.shields.io/badge/Paper-arXiv:2408.16119-b31b1b.svg)](https://arxiv.org/abs/2408.16119)&ensp;
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)&ensp;
[![YouTube](https://img.shields.io/badge/YouTube-white?logo=youtube&logoColor=%23FF0000)](https://youtu.be/3ndlwt0Wi3c)&ensp;
[![build](https://github.com/microsoft/data-formulator/actions/workflows/python-build.yml/badge.svg)](https://github.com/microsoft/data-formulator/actions/workflows/python-build.yml)

</div>

Transform data and create rich visualizations iteratively with AI 🪄. Try Data Formulator now in GitHub Codespaces!

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/data-formulator?quickstart=1)


<kbd>
<a target="_blank" rel="noopener noreferrer" href="https://codespaces.new/microsoft/data-formulator?quickstart=1" title="open Data Formulator in GitHub Codespaces"><img src="public/data-formulator-screenshot.png"></a>
</kbd>


## News 🔥🔥🔥

- [10-11-2024] Data Formulator python package released!
- You can now install Data Formulator using Python and run it locally, easily. [[check it out]](#get-started).
- Our Codespace configuration is also updated for fast start up ⚡️. [[try it now!]](https://codespaces.new/microsoft/data-formulator?quickstart=1)
- New exprimental feature: load an image or a messy text, and ask AI parsing and cleaning it for you(!). [[demo]](https://github.com/microsoft/data-formulator/pull/31#issuecomment-2403652717)

- [10-01-2024] Initial release of Data Formulator, check out our [[blog]](https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/) and [[video]](https://youtu.be/3ndlwt0Wi3c)!



## Overview

**Data Formulator** is an application from Microsoft Research that uses large language models to transform data, expediting the practice of data visualization.

To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need proficiency in data transformation and visualization tools, and they also spend effort managing the iteration history. This can be challenging!
Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines *user interface interactions (UI)* and *natural language (NL) inputs* for easier interaction. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI.

Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines user interface interactions (UI) with natural language (NL) inputs. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI.
## Get Started

Check out these cool Data Formulator features that can help you create impressive visualizations!
* Using the **blended UI and NL inputs** to describe the chart.
* Utilizing **data threads** to navigate the history and reuse previous results to create new ones instead of starting from scratch every time.
Play with Data Formulator with one of the following options:

## Get Started
- **Option 1: Install via Python PIP**

Use Python PIP for an easy setup experience, running locally (recommend: install it in a virtual environment).

```bash
# install data_formulator
pip install data_formulator

Choose one of the following options to set up Data Formulator:
# start data_formulator
data_formulator

# alternatively, you can run data formualtor with this command
python -m data_formulator
```

- **Option 1: Codespaces**
Data Formulator will be automatically opened in the browser at [http://localhost:5000](http://localhost:5000).

- **Option 2: Codespaces (5 minutes)**

Use Codespaces for an easy setup experience, as everything is preconfigured to get you up and running quickly. For more details, see [CODESPACES.md](CODESPACES.md).
You can also run Data Formualtor in codespace, we have everything pre-configured. For more details, see [CODESPACES.md](CODESPACES.md).

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/data-formulator?quickstart=1)

- **Option 2: Local Installation**
- **Option 3: Working in the developer mode**

Opt for a local installation if you prefer full control over your development environment and the ability to customize the setup to your specific needs. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).
You can build Data Formulator locally if you prefer full control over your development environment and the ability to customize the setup to your specific needs. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).


## Using Data Formulator

Once you’ve completed the setup using either option, follow these steps to start using Data Formulator:

### The basics of data visualization
* Provide OpenAI keys and select a model (GPT-4o suggested) and choose a dataset
* Choose a visualization type
* Drag and drop data fields to the encoding shelf to create visualization

* Provide OpenAI keys and select a model (GPT-4o suggested) and choose a dataset.
* Choose a chart type, and then drag-and-drop data fields to chart properties (x, y, color, ...) to specify visual encodings.

https://github.com/user-attachments/assets/0fbea012-1d2d-46c3-a923-b1fc5eb5e5b8


### Create visualization beyond the initial dataset (powered by 🤖)
* Add new field names in the encoding shelf, describe the chart intent
* Click the **Formulate** button
* Inspect the code behind the concept
* Follow up the chart to create new ones
* You can type names of **fields that do not exist in current data** in the encoding shelf:
- this tells Data Formulator that you want to create visualizions that require computation or transformation from existing data,
- you can optionally provide a natural language prompt to explain your intent to clarify your intent (not necessary when field names are self-explanatory).
* Click the **Formulate** button.
- Data Formulator will transform data and instantiate the visualization based on the encoding and prompt.
* Inspect the data, chart and code.
* To create a new chart based on existing ones, follow up in natural language:
- provide a follow up prompt (e.g., *``show only top 5!''*),
- you may also update visual encodings for the new chart.

https://github.com/user-attachments/assets/160c69d2-f42d-435c-9ff3-b1229b5bddba

https://github.com/user-attachments/assets/c93b3e84-8ca8-49ae-80ea-f91ceef34acb

Repeat this process as needed to explore and understand your data. Your explorations are trackable in the **Data Threads** panel.

## Developers
## Developers' Guide

Follow the [developers' instructions](DEVELOPMENT.md) to build your new data analysis tools on top of Data Formulator.


## Research Papers
* [Data Formulator 2: Iteratively Creating Rich Visualizations with AI](https://arxiv.org/abs/2408.16119)

Expand Down
2 changes: 1 addition & 1 deletion local_server.bat
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
:: Licensed under the MIT License.

@echo off
set FLASK_APP=app.py
set FLASK_APP=py-src/data_formulator/app.py
set FLASK_RUN_PORT=5000
set FLASK_RUN_HOST=0.0.0.0
flask run
2 changes: 1 addition & 1 deletion local_server.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

env FLASK_APP=app.py FLASK_RUN_PORT=5000 FLASK_RUN_HOST=0.0.0.0 flask run
env FLASK_APP=py-src/data_formulator/app.py FLASK_RUN_PORT=5000 FLASK_RUN_HOST=0.0.0.0 flask run
3 changes: 3 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"ag-grid-enterprise": "^32.0.2",
"ag-grid-react": "^32.0.2",
"d3": "^7.3.0",
"dompurify": "^3.1.7",
"localforage": "^1.10.0",
"lodash": "^4.17.21",
"markdown-to-jsx": "^7.1.8",
Expand All @@ -24,6 +25,7 @@
"react": "^18.2.0",
"react-animate-height": "^3.0.4",
"react-animate-on-change": "^2.2.0",
"react-diff-viewer": "^3.1.1",
"react-dnd": "^16.0.1",
"react-dnd-html5-backend": "^16.0.1",
"react-dom": "^18.2.0",
Expand All @@ -37,6 +39,7 @@
"redux": "^4.2.0",
"redux-persist": "^6.0.0",
"typescript": "^4.9.5",
"validator": "^13.12.0",
"vega": "^5.23.0",
"vega-embed": "^6.21.0",
"vega-lite": "^5.5.0",
Expand Down
5 changes: 5 additions & 0 deletions py-src/data_formulator/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from .app import run_app

__all__ = [
"run_app",
]
4 changes: 4 additions & 0 deletions py-src/data_formulator/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from .app import run_app

if __name__ == "__main__":
run_app()
Loading