Skip to content

linkml/linkml-embeddings-explorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

linkml-embeddings-explorer

Interactive 2D visualization of embedding spaces. Generate static HTML explorers for embedding data with support for UMAP and t-SNE dimensionality reduction, interactive filtering, and configurable display options.

Features

  • Interactive Plotly.js scatter plots - Pan, zoom, hover for details
  • Multiple embedding spaces - Compare different embeddings side by side
  • UMAP and t-SNE - Toggle between dimensionality reduction methods
  • Dynamic coloring - Color by any metadata field
  • Filter by groups - Checkbox filters for categories
  • Search - Filter points by name
  • Toggle labels - Show/hide point labels
  • Click navigation - Configure links to detail pages
  • Static output - Works on GitHub Pages (no server required)

Installation

pip install linkml-embeddings-explorer

Or with uv:

uv add linkml-embeddings-explorer

Quick Start

1. Prepare your embeddings file

Create a JSON file with your embeddings and metadata:

{
  "spaces": {
    "main": {
      "embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...], ...],
      "metadata": [
        {"name": "Item 1", "category": "A", "description": "..."},
        {"name": "Item 2", "category": "B", "description": "..."}
      ]
    }
  }
}

If you have pre-computed 2D coordinates, include them instead of raw embeddings:

{
  "spaces": {
    "main": {
      "umap": [[1.2, 3.4], [2.3, 4.5], ...],
      "tsne": [[0.1, 0.2], [0.3, 0.4], ...],
      "metadata": [...]
    }
  }
}

2. Generate the explorer

linkml-embeddings-explorer deploy embeddings.json output/

3. Open in browser

open output/index.html

CLI Commands

deploy

Generate an embedding explorer from an embeddings file.

linkml-embeddings-explorer deploy embeddings.json output/

# With custom title
linkml-embeddings-explorer deploy embeddings.json output/ --title "My Data Explorer"

# With config file
linkml-embeddings-explorer deploy embeddings.json output/ --config config.json

init-config

Generate a config template from your embeddings file.

linkml-embeddings-explorer init-config embeddings.json -o config.json

info

Show information about an embeddings file.

linkml-embeddings-explorer info embeddings.json

index (linkml-store)

Index records with linkml-store and cache embeddings using a pipeline config.

linkml-embeddings-explorer index pipeline.yaml
linkml-embeddings-explorer index pipeline.yaml --space gocam --recreate

app-data (linkml-store)

Export embeddings + metadata to embeddings.json or data.js.

linkml-embeddings-explorer app-data pipeline.yaml -o embeddings.json
linkml-embeddings-explorer app-data pipeline.yaml -o data.js --format js --no-tsne

Configuration

Create a config.json to customize the explorer:

{
  "title": "My Embedding Explorer",
  "description": "Explore items in embedding space",
  "colorFields": ["category", "type", "_group"],
  "defaultColorField": "category",
  "hoverFields": ["name", "description", "category"],
  "labelField": "name",
  "linkTemplate": "../items/{name}.html",
  "defaultSpace": "main",
  "defaultMethod": "umap",
  "backLink": "../index.html",
  "backText": "Back to Home"
}

Config Options

Option Description
title Page title
description Description shown in header
colorFields Fields available in "Color By" dropdown
defaultColorField Initial color field
hoverFields Fields shown on hover
labelField Field used for point labels
linkTemplate URL template for click navigation (use {field} placeholders)
defaultSpace Initial embedding space (for multi-space explorers)
defaultMethod Initial reduction method (umap or tsne)
backLink URL for back link in header
backText Text for back link

Python API

from linkml_embeddings_explorer import EmbeddingExplorerGenerator
import numpy as np

# Create from embeddings and metadata
embeddings = np.random.randn(100, 384)
metadata = [{"name": f"Item {i}", "category": "A" if i < 50 else "B"} for i in range(100)]

generator = EmbeddingExplorerGenerator(embeddings, metadata)
generator.generate(Path("output/"), title="My Explorer")

# Or add multiple spaces
generator = EmbeddingExplorerGenerator()
generator.add_space("pathophysiology", embeddings=emb1, metadata=meta1)
generator.add_space("phenotypes", embeddings=emb2, metadata=meta2)
generator.generate(Path("output/"), title="Multi-Space Explorer")

Integration with linkml-store

If you're using linkml-store with LLMIndexer, you can export embeddings for visualization:

# Export embeddings from linkml-store collection
from linkml_embeddings_explorer.core import EmbeddingExplorerGenerator
import duckdb
import json

# Read embeddings from linkml-store cache
conn = duckdb.connect("cache/embeddings.db", read_only=True)
rows = conn.execute("SELECT text, embedding FROM all_embeddings").fetchall()
conn.close()

# Parse names from text and build metadata
embeddings = []
metadata = []
for text, embedding in rows:
    name = text.split("\n")[0].replace("Name: ", "").strip()
    embeddings.append(list(embedding))
    metadata.append({"name": name, "category": "..."})

# Generate explorer
generator = EmbeddingExplorerGenerator(np.array(embeddings), metadata)
generator.generate(Path("explorer/"))

LinkML-store pipeline (config-driven)

This repo now includes a config-driven pipeline that mirrors the pattern in dismech:

  1. Index records with linkml-store + LLMIndexer (embeddings cached in DuckDB).
  2. Export app data (UMAP/t-SNE + metadata) to embeddings.json or data.js.
  3. Generate the static explorer via deploy.

Install the extra dependencies for indexing:

just install-dev
uv sync --group embeddings

Example config (examples/gocams/pipeline.yaml):

source:
  path: /Users/cjm/repos/go-cam-browser/public/data.json
  format: json

store:
  database: cache/gocam_embeddings.duckdb
  alias: gocam

spaces:
  gocam:
    collection: gocams
    template: templates/gocam.j2
    index_name: gocam_index
    cache_db: cache/gocam_cache.db
    embedding_model_name: text-embedding-3-small
    text_template_syntax: jinja2
    metadata_fields:
      - id
      - title
      - taxon_label
      - enabled_by_gene_labels
      - part_of_term_labels
      - occurs_in_term_labels
    add_name_from: title
    group_by: taxon_label
    groups: []

app_data:
  output: embeddings.json
  format: json
  include_tsne: true

Template note: a starter GO-CAM template is provided at templates/gocam.j2. Example config is available at examples/gocams/pipeline.yaml.

Run the pipeline:

linkml-embeddings-explorer index examples/gocams/pipeline.yaml
linkml-embeddings-explorer app-data examples/gocams/pipeline.yaml -o examples/gocams/embeddings.json
linkml-embeddings-explorer deploy examples/gocams/embeddings.json examples/gocams/app/ --title "GO-CAM Explorer"

Notes:

  • If you set app_data.output to a .js file (or --format js), it writes window.EMBEDDING_DATA directly.
  • Use --no-tsne on app-data to skip t-SNE for large datasets.
  • The YAML maps 1:1 onto a Pydantic model (PipelineConfig).

Other examples:

  • examples/fake/ is a self-contained tiny dataset for quick testing.

Justfile snippet (drop into your project):

pipeline-index config:
    uv run linkml-embeddings-explorer index {{config}}

pipeline-app-data config output="embeddings.json":
    uv run linkml-embeddings-explorer app-data {{config}} -o {{output}}

Development

# Clone repository
git clone https://github.com/linkml/linkml-embeddings-explorer
cd linkml-embeddings-explorer

# Install with dev dependencies
just install-dev

# Run tests
just test

# Run all QC checks
just qc

# Generate example
just example

Examples

See examples/README.md for the list of demos and how to run them. The docs landing page lives at docs/index.html with a small gallery.

License

MIT License

About

Visualization of embeddings of linkml-compliant datasets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published