Skip to content

prairie-guy/vizard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vizard: A DSL that Compiles High-Level Declarations to Altair & Matplotlib Code

A stateful declarative language for LLM-driven Python visualization code combining structured keywords with natural language.

Vizard lets you create data visualizations by describing what you want in a mix of CAPITALIZED keywords and natural language, with intelligent defaults and stateful keyword persistence for iterative figure development.


Features

  • 🗣️ Natural + Structured: Mix CAPITALIZED keywords with plain English
  • 💾 Stateful: Keywords persist across calls, enabling iterative refinement
  • 🐻‍❄️ Polars-first: Modern, fast dataframe operations with streaming/chaining
  • 📊 Multi-engine: Altair (default), Matplotlib, and Seaborn support
  • 🔧 Flexible: Supports minimal to highly detailed specifications
  • 🤖 Intelligent: LLM fills gaps with sensible defaults
  • 🔄 Conversational: Refine figures through natural dialogue

Installation

git clone https://github.com/prairie-guy/vizard
cd vizard
./setup.sh

This installs:

  • vizard CLI tool to ~/.local/bin/vizard
  • vizard_magic Python package globally
  • cc_jupyter with patches applied

Verify installation:

vizard version

Note: Ensure ~/.local/bin is in your PATH:

export PATH="$HOME/.local/bin:$PATH"  # Add to ~/.bashrc or ~/.zshrc

Quick Start

Choose your preferred workflow:

Option A: Use Existing Jupyter/IPython (Fastest)

Best for: Quick exploration, working across multiple projects

  1. Start Jupyter/IPython

    jupyter lab
    # or: jupyter notebook, ipython
  2. Load vizard extension

    %load_ext vizard_magic
  3. Start creating visualizations (see Example Session below)

Option B: Use vizard CLI (Per-Project)

Best for: Reproducible research, isolated environments

  1. Navigate to project and start vizard

    cd ~/my-project
    vizard start

    This creates an isolated environment with:

    • Project-local .venv/ with dependencies
    • Vizard templates (pyproject.toml, CLAUDE.md)
    • JupyterLab server with connection URL
  2. Open the URL in your browser

  3. Load vizard extension in notebook

    %load_ext vizard_magic
  4. When done:

    vizard stop

Example Session

%load_ext vizard_magic

Create a simple bar chart:

%%cc
VZ
DATA data/sales.csv
PLOT bar
X product Y revenue
FILENAME images/most_basic_bar_chart_basic.png

Generated code:

df = pl.read_csv('data/sales.csv')

chart = alt.Chart(df).mark_bar(color='steelblue').encode(
    x=alt.X('product:N', title='Product'),
    y=alt.Y('revenue:Q', title='Revenue')
).properties(width=600, height=400)

chart.save('images/most_basic_bar_chart_basic.png', scale_factor=2.0)

chart

Simple Bar Chart

Check current state:

%cc KEYS

Output:

ENGINE: altair
DF: polars
WIDTH: 600
HEIGHT: 400
FUNCTION: false
IMPORT: false
OUTPUT: display
DATA: data/sales.csv
PLOT: bar
X: product
Y: revenue
FILENAME: images/most_basic_bar_chart_basic.png

Refine by adding color:

%%cc
COLOR category
FILENAME images/bar_chart_basic.png

Generated code:

df = pl.read_csv('data/sales.csv')

chart = alt.Chart(df).mark_bar().encode(
    x=alt.X('product:N', title='Product'),
    y=alt.Y('revenue:Q', title='Revenue'),
    color=alt.Color('category:N', title='Category')
).properties(width=600, height=400)

chart.save('images/bar_chart_basic.png', scale_factor=2.0)

chart

Bar Chart with Color


Preprocessing Data with ||

Vizard supports data preprocessing using the || delimiter to separate Polars data manipulation from Altair visualization:

Simple Examples

Filter before plotting:

%%cc
DATA genes.csv
FILTER pvalue < 0.05
|| PLOT scatter X expression Y pvalue TITLE Significant Genes

Select columns and add computed field:

%%cc
DATA raw_data.csv
SELECT gene_name, fold_change, pvalue
ADD log2_fc as log2(fold_change)
|| PLOT bar X gene_name Y log2_fc

Group and aggregate:

%%cc
DATA sales.csv
GROUP by category aggregating sum(revenue) as total
|| PLOT bar X category Y total

Preprocessing Keywords

  • FILTER - Filter rows: FILTER pvalue < 0.05 and expression > 2
  • SELECT - Keep columns: SELECT gene_name, expression, pvalue
  • DROP - Remove columns: DROP columns internal_id, debug_flag
  • SORT - Sort data: SORT by pvalue descending
  • ADD - Computed columns: ADD log2_expr as log2(expression)
  • GROUP - Aggregate: GROUP by condition aggregating mean(expression)
  • SAVE - Save to file: SAVE processed_data.csv

Complex Example

%%cc
DATA diff_expression.csv
SELECT gene_name, log2fc, pvalue
FILTER pvalue < 0.05 and abs(log2fc) > 1.5
ADD neg_log10_pv as -log10(pvalue)
SORT by neg_log10_pv descending
|| PLOT scatter X log2fc Y neg_log10_pv TITLE Volcano Plot

This generates chained Polars preprocessing code followed by Altair visualization code.

See CLAUDE.md for complete documentation and examples.


Examples

Scatter Plot

%%cc
DATA genes.csv
PLOT scatter
X expression Y pvalue
COLOR significant
Add tooltips with gene names

Generated Altair code:

df = pl.read_csv('genes.csv')

chart = alt.Chart(df).mark_point(size=60).encode(
    x=alt.X('expression:Q', title='Expression'),
    y=alt.Y('pvalue:Q', title='P-value'),
    color=alt.Color('significant:N', title='Significant'),
    tooltip=['gene_name:N', 'expression:Q', 'pvalue:Q']
).properties(
    title='Gene Expression vs P-value',
    width=600,
    height=400
)

chart

Scatter Plot Example

Line Chart (Time Series)

%%cc
DATA timeseries.csv
PLOT line
X date Y temperature
COLOR location

Generated Altair code:

df = pl.read_csv('timeseries.csv')

chart = alt.Chart(df).mark_line().encode(
    x=alt.X('date:T', title='Date'),
    y=alt.Y('temperature:Q', title='Temperature'),
    color=alt.Color('location:N', title='Location')
).properties(
    title='Temperature Over Time',
    width=600,
    height=400
)

chart

Line Chart Example

Grouped Bar Chart

%%cc
DATA expression.csv
PLOT bar
X gene_name Y expression_level
COLOR condition
BAR_LAYOUT grouped

Generated Altair code:

df = pl.read_csv('expression.csv')

chart = alt.Chart(df).mark_bar().encode(
    x=alt.X('gene_name:N', title='Gene Name'),
    xOffset=alt.XOffset('condition:N'),
    y=alt.Y('expression_level:Q', title='Expression Level'),
    color=alt.Color('condition:N', title='Condition')
).properties(
    title='Gene Expression by Condition',
    width=600,
    height=300
)

chart

Grouped Bar Chart Example

Faceted Plot (Small Multiples)

%%cc
DATA data.csv
PLOT scatter
X value1 Y value2
ROW condition
COLUMN replicate

Faceted Plot Example

Multi-Series Line Chart

%%cc
DATA stocks
PLOT line
X date Y price
SERIES symbol
COLOR symbol
TITLE Stock Prices Over Time

Note: Uses Altair's built-in stocks dataset. SERIES groups lines by symbol.

Bar Chart with Text Labels

%%cc
DATA sales.csv
PLOT bar
X product Y revenue
TEXT revenue
Show revenue values on top of bars

Error Bars (Range Encodings)

%%cc
DATA measurements.csv
PLOT point
X category Y mean_value
Y2 upper_ci
Add error bars showing confidence intervals

Window Transformations

%%cc
DATA timeseries.csv
PLOT line
X date Y value
WINDOW cumsum
Show cumulative sum over time

Heatmap

%%cc
DATA expression_matrix.csv
PLOT heatmap
X sample Y gene
COLOR expression
Use viridis color scheme
TITLE Gene Expression Heatmap

Heatmap Example

Box Plot

%%cc
DATA measurements.csv
PLOT box
X group Y value
TITLE Measurement Distributions by Group

Box Plot Example

Volcano Plot with Iterative Refinement

%%cc
RESET

%%cc
DATA diff_expression.csv
PLOT volcano
X log2fc Y neg_log10_pvalue
IMPORT

%%cc
Add threshold lines at x=±1.5 and y=1.3

%%cc
Color upregulated red, downregulated blue, non-significant gray

%%cc
TITLE Differential Gene Expression Analysis
WIDTH 800
HEIGHT 800

%%cc
OUTPUT save
FILENAME figure1_volcano.png

Volcano Plot Example


Using Vizard Specifications

Basic Syntax

Mix CAPITALIZED keywords with natural language:

# All keywords
%%cc DATA genes.csv PLOT bar X gene_name Y expression_level

# All natural language
%%cc Create a bar chart from genes.csv showing gene_name vs expression_level

# Mixed (recommended)
%%cc DATA genes.csv - make a bar chart with X gene_name and Y expression_level, sorted by value

Essential Keywords

Keywords control behavior and persist in state (.vizard_state.json):

Data & Plot:

  • DATA - Data source (file path, URL, variable name, or Altair dataset)
  • PLOT - Chart type: bar, scatter, line, histogram, volcano, heatmap, box, etc.
  • DF - Dataframe library: polars (default), pandas

Visual Encodings:

  • X, Y - Columns for axes
  • X2, Y2 - Secondary positions for range encodings (error bars, Gantt charts)
  • COLOR - Column to color by
  • SIZE - Column for point/mark size
  • SHAPE - Column for point shape
  • OPACITY - Column for transparency
  • SERIES - Column for grouping without visual encoding (multi-series line charts)
  • TEXT - Column for text labels on marks
  • ROW, COLUMN - Faceting (small multiples)
    • ROW arranges plots horizontally in a row
    • COLUMN arranges plots vertically in a column

Grouping & Transformations:

  • BAR_LAYOUT - Bar chart layout: grouped, stacked, normalized
  • WINDOW - Window transformations: cumsum, rank, row_number, mean, lag, lead

Styling:

  • WIDTH - Chart width in pixels (default: 600)
  • HEIGHT - Chart height in pixels (default: 400)
  • TITLE - Chart title
  • ENGINE - Visualization library: altair (default), matplotlib, seaborn

Code Generation:

  • FUNCTION - Generate reusable function (default: false)
  • IMPORT - Include imports (default: false)

Meta Commands:

  • KEYWORDS or KEYS - Show current state
  • RESET - Clear state and restore defaults
  • HELP - Show help documentation

Iterative Refinement

Keywords persist across cells - refine your figure step by step:

# Start with basic plot
%%cc
DATA mydata.csv
PLOT bar
X category Y value

# Add color (other keywords persist)
%%cc
COLOR group

# Adjust size
%%cc
WIDTH 800
HEIGHT 500

# Add natural language styling
%%cc
Make the bars green and add value labels on top

# Check what's in state
%%cc
KEYWORDS

# Start fresh for new figure
%%cc
RESET

State Management

Vizard maintains state in .vizard_state.json:

%%cc WIDTH 700 HEIGHT 450 DATA mydata.csv

%%cc PLOT bar X category Y value
# ↑ Automatically uses WIDTH: 700, HEIGHT: 450 from previous cell

%%cc KEYWORDS
# Shows all current keyword values

%%cc RESET
# Clears state, restores defaults

Workflow:

  1. Iterate on a figure → State accumulates
  2. Figure complete → Use it
  3. Start new figure → RESET → Fresh state

Code Generation Options

# Default: No imports, script-style code
%%cc
DATA data.csv
PLOT bar
X category Y value

# With imports (for copy-paste to .py files)
%%cc
DATA data.csv
PLOT bar
X category Y value
IMPORT

# Generate reusable function
%%cc
DATA data.csv
PLOT bar
X category Y value
FUNCTION
IMPORT

Natural Language + Keywords

# Natural language for details
%%cc
DATA genes.csv
Create a volcano plot showing log2fc vs pvalue
Color upregulated genes red and downregulated blue
Add threshold lines at x=±1.5 and y=1.3

# Keywords for structure
%%cc
DATA genes.csv
PLOT volcano
X log2fc
Y pvalue
THRESHOLD_FC 1.5
THRESHOLD_P 1.3

# Mix both (recommended)
%%cc
DATA genes.csv PLOT volcano X log2fc Y pvalue
Color significant genes red, add threshold lines at 1.5 and 1.3

Dynamic Keywords

Any CAPITALIZED word becomes a keyword and persists in state:

%%cc
DATA results.csv
PLOT scatter
X log2fc Y pvalue
THRESHOLD 0.05
Highlight points where pvalue < THRESHOLD in red

%%cc
THRESHOLD 0.01
# Now uses new threshold value

%%cc KEYWORDS
# Shows: THRESHOLD: 0.01

Vizard CLI Commands

The vizard command manages project environments and JupyterLab:

vizard start [options]     # Start JupyterLab server
  -p, --port PORT          # Custom port (default: 9999)
  -t, --token TOKEN        # Custom token (default: auto-generated)
  --host HOST              # Custom hostname (default: system hostname)
  -f, --foreground         # Run in foreground

vizard stop [options]      # Stop JupyterLab server
  -p, --port PORT          # Stop server on specific port

vizard status              # Show server status

vizard clean [options]     # Remove runtime files
  --purge                  # Remove all vizard files including .venv

vizard update              # Update CLAUDE.md and vizard executable

vizard version             # Show version
vizard help                # Show help

Examples:

# Start with custom port for remote server
vizard start --port 8888 --host myserver.example.com

# Check status
vizard status

# Clean up runtime files (keeps notebooks and .venv)
vizard clean

# Full cleanup (removes everything except notebooks)
vizard clean --purge

# Stop server
vizard stop

Advanced Features

Polars Data Manipulation

Vizard generates Polars streaming/chaining code when data prep is needed:

%%cc
DATA results.csv
Filter to rows where pvalue < 0.05
Create a volcano plot showing log2fc vs pvalue
Color significant genes red

# Generated code uses Polars chaining:
# df = (pl.read_csv('results.csv')
#     .filter(pl.col('pvalue') < 0.05)
#     .with_columns([...]))

Spelling Tolerance

Common typos are recognized:

%%cc
DATA data.csv
PLOT bar
X cat Y val
COLOUR blue
TITEL My Chart
HIGHT 450
# Works! Recognizes COLOUR→COLOR, TITEL→TITLE, HIGHT→HEIGHT

Multi-Engine Support

# Altair (default) - declarative, interactive
%%cc
ENGINE altair
DATA data.csv
PLOT scatter
X value1 Y value2

# Matplotlib - publication-quality
%%cc
ENGINE matplotlib
DATA data.csv
PLOT bar
X category Y value

# Seaborn - statistical plots
%%cc
ENGINE seaborn
DATA data.csv
PLOT box
X group Y measurement

Default Values

ENGINE: altair
DF: polars
WIDTH: 600
HEIGHT: 400
FUNCTION: false
IMPORT: false
OUTPUT: display

Other keywords (X, Y, COLOR, etc.) have no defaults—they only appear in state when specified.


Supported Plot Types

  • Bar charts - Simple, stacked, grouped, normalized, with text labels
  • Scatter plots - With size, color, shape, opacity encodings
  • Line charts - Time series, multi-series (SERIES keyword)
  • Histograms - Configurable bins
  • Volcano plots - Bioinformatics differential expression
  • Heatmaps - Matrix visualizations
  • Box plots - Distribution comparisons
  • Faceted plots - Small multiples (ROW/COLUMN)
  • Error bars - Range encodings with X2/Y2
  • Window functions - Cumulative sums, rankings, rolling calculations

Coming soon: Violin plots, ridgeline plots, chord diagrams


Workflow Tips

  1. Start simple: Begin with minimal specification, iterate
  2. Use KEYWORDS often: Check state to understand what's persisted
  3. RESET between figures: Clear state when starting new visualization
  4. Mix styles: Keywords for structure, natural language for styling
  5. Leverage state: Set common parameters (WIDTH, HEIGHT) once, use many times
  6. Generate functions: Use FUNCTION for reusable plotting code

Troubleshooting

Q: %load_ext vizard_magic gives "No module named 'vizard_magic'"

  • Run ./setup.sh again to ensure vizard_magic is installed
  • Check: python3 -c "import vizard_magic" should work

Q: My plot isn't using the right dimensions

  • Check state with %%cc KEYWORDS - are WIDTH/HEIGHT set?
  • Use %%cc RESET to clear old dimensions

Q: Code has imports but I don't want them

  • IMPORT defaults to false - don't include IMPORT keyword
  • Check if IMPORT is in state: %%cc KEYWORDS

Q: Keywords not persisting

  • Ensure keywords are CAPITALIZED
  • Check .vizard_state.json exists in directory

Q: Permission denied: '/root/code'

  • This is fixed by patches applied during setup.sh or vizard start
  • If you see this error, re-run setup.sh or check patch output

Q: Different Python versions causing issues

  • Global mode (setup.sh) installs for the Python version python3 points to
  • Per-project mode (vizard start) uses the .venv's Python version
  • If using multiple Python versions, use Per-Project mode for each project

Technical

Installation Modes

Vizard supports two installation modes:

Mode 1: Global Installation (Quick & Convenient)

setup.sh installs cc_jupyter and vizard_magic to ~/.local/lib/python*/site-packages/ and applies patches globally. Works in any Jupyter/IPython environment.

  • ✅ Quick data exploration across projects
  • ✅ Existing Jupyter workflows
  • ⚠️ Single cc_jupyter version across all projects

Mode 2: Per-Project Isolated (Reproducible & Safe)

vizard start creates project-local .venv/ with cc_jupyter installed and patched in the project's virtual environment. Each project has its own dependency versions.

  • ✅ Published research / production
  • ✅ Reproducible environments with version-pinned dependencies
  • ⚠️ Requires vizard start per project

Note: Both modes can coexist. Use Global for daily work, Per-Project for important analyses.

Version Management

setup.sh installs a pinned version of cc_jupyter (0.0.1) tested with Vizard's patches.

If you see a version mismatch warning during setup:

⚠ Version mismatch detected
Patches are tested with version 0.0.1

Force reinstall the vendored version:

pip install --user --force-reinstall ~/.local/share/vizard/lib/vendor/claude_code_jupyter_staging-0.0.1-py3-none-any.whl
~/.local/share/vizard/lib/patch_global_cc_jupyter.sh

Patching Mechanism

Vizard modifies cc_jupyter to enable the %%cc magic command for Vizard specifications:

Global Patching:

  • lib/patch_global_cc_jupyter.sh - Patches globally installed cc_jupyter
  • Applied once during setup.sh
  • Affects all projects using global installation

Per-Project Patching:

  • lib/patch_jupyter_magic.sh - Patches project-local cc_jupyter
  • Applied during vizard start for each project
  • Isolated to project's .venv/

What gets patched:

  • Registers the vizard_magic IPython extension
  • Enables %%cc cell magic in Jupyter notebooks
  • Loads the Vizard specification (CLAUDE.md) into Claude Code's context

Repository Structure

vizard/
├── vizard                         # Main executable (bash script)
├── setup.sh                       # Installation script
├── uninstall.sh                   # Uninstallation script
├── README.md                      # This file
├── lib/
│   ├── vizard_magic/
│   │   └── __init__.py            # Jupyter IPython extension
│   ├── patch_jupyter_magic.sh    # Per-project cc_jupyter patcher
│   ├── patch_global_cc_jupyter.sh # Global cc_jupyter patcher
│   └── vendor/
│       └── claude_code_jupyter_staging-0.0.1-py3-none-any.whl
├── templates/
│   ├── CLAUDE.md                  # Vizard specification (~37KB)
│   ├── pyproject.toml             # Project dependencies template
│   ├── vizard_template.ipynb      # Notebook template
│   └── purge_manifest.txt         # Cleanup manifest
└── test/
    ├── data/                      # Synthetic test datasets
    ├── generate_sample_data.py    # Test data generator
    ├── generate_images.ipynb      # Image generation notebook
    └── vizards_test.ipynb         # Comprehensive test suite (60+ tests)

Per-Project Files (created by vizard start):

<your-project>/
├── .env.jupyter              # Jupyter configuration
├── .jupyter.pid              # Process ID
├── .jupyter.log              # Server logs
├── pyproject.toml            # Python dependencies
├── uv.lock                   # Dependency lock file
├── .venv/                    # Virtual environment
├── CLAUDE.md                 # Vizard specification
├── .vizard_state.json        # Keyword state
├── .vizard_template.ipynb    # Notebook template
└── .claude/
    └── settings.json         # Claude Code permissions

Design Philosophy

Vizard is NOT:

  • ❌ A rigid DSL with one-to-one code mapping
  • ❌ A replacement for learning Altair/Matplotlib/Seaborn
  • ❌ Guaranteed to produce identical code each time

Vizard IS:

  • ✅ Structured guidance via keywords
  • ✅ LLM reasoning for intelligent defaults
  • ✅ Balance of consistency and flexibility
  • ✅ Iterative figure development workflow
  • ✅ Natural language + structure hybrid

Testing

Run the comprehensive test suite:

cd vizard/test
jupyter lab vizards_test.ipynb

Test coverage (60+ tests across 12 sections):

  • Setup & Configuration
  • Dataset Loading (CARS, STOCKS, MOVIES from Altair)
  • Core Visual Encodings (X, Y, COLOR, SIZE, SHAPE, OPACITY, ROW, COLUMN)
  • Range Encodings (X2, Y2)
  • Advanced Encodings (SERIES, TEXT)
  • Layout & Grouping (BAR_LAYOUT: stacked, grouped, normalized)
  • Data Transformations (WINDOW: cumsum, rank, row_number)
  • Meta Commands (KEYWORDS, KEYS, RESET, HELP)
  • Code Generation (FUNCTION, IMPORT)
  • State Management & Persistence
  • Syntax Flexibility (keywords, natural language, mixed)
  • Edge Cases (nulls, many encodings, ENGINE switching, spelling tolerance)

Future Roadmap

Phase 2: Refinement (based on testing feedback)

  • Gallery fetching for uncommon plot types
  • Additional plot type examples
  • Enhanced dynamic keywords

Phase 3: Expansion

  • Additional plot types for all engines
  • Multi-panel layout improvements
  • Interactive features (brush, zoom, tooltips)

Phase 4: Publication Mode

  • DPI control
  • Panel labels (A, B, C)
  • Journal-specific formats
  • Fine-grained typography

Contributing

This is an early prototype. Feedback welcome on:

  • Keyword design
  • Default values
  • Natural language parsing
  • Code quality
  • Missing features

License

To be determined.


Acknowledgments

Built with:


Quick Reference Card:

# Essential commands
%load_ext vizard_magic    # Load extension (once per kernel)
%%cc KEYWORDS              # Show current state
%%cc RESET                 # Clear state and start fresh
%%cc HELP                  # Show help

# Basic syntax
%%cc DATA file.csv PLOT bar X col1 Y col2

# Natural + keywords
%%cc DATA file.csv - create a scatter plot with X val1 and Y val2 colored by group

# Iterate on a figure
%%cc WIDTH 800 COLOR category TITLE My Chart

# Generate code with imports
%%cc IMPORT

Ready to create beautiful visualizations? Start with Option A above - it takes 30 seconds!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •