Skip to content

introduce an array mark utilizing the heatmap transform for array data #9389

Open

Description

This feature request proposes the addition of a new array mark to Vega-Lite.

This mark aims to improve support for the visualization of various types of 2D data, including heatmaps, image data, and other matrix-based representations, with built-in support for color scales, axis labels, and faceting. I see this is an initial step towards #6043, as this focus on just a single transform in Vega, but many issues discussed in that issue also apply to this issue.

The following variants are an exploration on how the heatmap transform within Vega behaves, and how data can be prepared for ingestion within the specification. This is an initial attempt that can hopefully serve as a starting point to explore this field a bit more with the hope that someone is brave enough to turn this into an attempt for a PR.

variants explored so far

  1. heatmap transform single array only
  2. heatmap transform with color scale
  3. heatmap transform with color scale and axis
  4. heatmap transform double array faceted with color scale and axis
  5. heatmap transform single array with non-zero x and y scale
  6. heatmap transform double array with non-zero x and y scale

Note: in the specs below, I've reduced the length of the grid values. In the accompanying Vega-Editor links all values of the grids are included.

heatmap transform values only

A basic implementation using numpy to generate a heatmap from a single array, displaying it with Vega. The image is rendered with opacity levels only.

import numpy as np
import matplotlib.pyplot as plt
from skimage import data
from skimage.transform import rescale
import pyperclip

array = data.camera()

array_small = rescale(array, 0.245, anti_aliasing=False)
array_round = (array_small * 255).astype(np.uint8)

plt.imshow(array_round, cmap='gray')
print('shape', array_round.shape)

array_as_flatlist = array_round.flatten(order='C').tolist()  # row-major

print('head', array_as_flatlist[0:5])
pyperclip.copy(str(array_as_flatlist))

image

We can make it work using the heatmap transform in Vega, using the following specification (Vega-Editor):

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",  
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [
        {
          "width": 125,
          "height": 125,
          "values": [199, 200, 200, 198, 198, 118, 135, 161, 161, 140]
        }
      ]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [{"type": "heatmap"}]
    }
  ],
  "marks": [
    {
      "type": "image",
      "from": {"data": "GRID_IMAGE"},
      "encode": {
        "update": {
          "x": {"value": 0},
          "y": {"value": 0},
          "image": {"field": "image"},
          "width": {"signal": "datum.width"},
          "height": {"signal": "datum.height"}
        }
      }
    }
  ]
}

The result looks like this:

image

It seems this is the image drawn with opacity levels only.

heatmap transform with color scale

Adding a color scale to the heatmap to enhance visual differentiation of values. This example replicates a grayscale image using Vega's color scale functionality.

Let's add a color scale (Vega-Editor):

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",  
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [
        {
          "width": 125,
          "height": 125,
          "values": [199, 200, 200, 198, 198, 130, 118, 135, 161, 161, 140]
        }
      ]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [
        {
          "type": "heatmap",
          "color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
          "opacity": 1
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "COLOR_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 1],
      "range": {"scheme": "viridis"}
    }
  ],
  "marks": [
    {
      "type": "image",
      "from": {"data": "GRID_IMAGE"},
      "encode": {
        "update": {
          "x": {"value": 0},
          "y": {"value": 0},
          "image": {"field": "image"},
          "width": {"signal": "datum.width"},
          "height": {"signal": "datum.height"}
        }
      }
    }
  ]
}

The result will look like this:

image

Using this approach, I also can reproduce the grayscale image like in python using plt.imshow().

By modifying the color scale as such (Vega-Editor):

{
  "name": "COLOR_SCALE",
  "type": "linear",
  "zero": true,
  "domain": [0, 1],
  "range": {"scheme": "greys"},
  "reverse": true
}

image

heatmap transform with color scale and axis

Enhancing the previous example by including axis labels, providing context to the grid values. This facilitates interpretation of the data.

Next step is to add axis to the image.
The Vega specification now looks as such (Vega-Editor):

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "width": 250,
  "height": 250,
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [
        {
          "width": 125,
          "height": 125,
          "values": [199, 200, 200, 198, 198, 118, 135, 161, 161, 140]
        }
      ]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [
        {
          "type": "heatmap",
          "color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
          "opacity": 1
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "COLOR_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 1],
      "range": {"scheme": "viridis"}
    },
    {
      "name": "X_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 125],
      "range": "width"
    },
    {
      "name": "Y_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 125],
      "range": "height"
    }
  ],
  "axes": [
    {
      "scale": "X_SCALE",
      "domain": false,
      "orient": "bottom",
      "tickCount": 5,
      "labelFlush": true
    },
    {
      "scale": "Y_SCALE",
      "domain": false,
      "orient": "left",
      "titlePadding": 5,
      "offset": 2
    }
  ],
  "marks": [
    {
      "type": "image",
      "from": {"data": "GRID_IMAGE"},
      "encode": {
        "update": {
          "x": {"value": 0},
          "y": {"value": 0},
          "image": {"field": "image"},
          "width": {"signal": "width"},
          "height": {"signal": "height"}
        }
      }
    }
  ]
}

image

So far so good.

heatmap transform double array faceted with color scale and axis

Faceting multiple grids within a single visualization. This example demonstrates handling of two separate arrays with independent color scales and axis labels.

Are we able to facet grids, if we have for example two grids as input?

I've adapted my python code to prepare the data arrays:

import numpy as np
from skimage import data
from skimage import color
from skimage.transform import rescale
import pyperclip
import json

def array2vega(array):
    grid = {
        'height': array.shape[0],
        'width': array.shape[1],
        'values': array.flatten(order='C').tolist()  # row-major
    }
    return grid

array = data.camera()
array_small = rescale(array, 0.245, anti_aliasing=False)
array_round = np.round(array_small, 2)

grid0 = array2vega(array_round)
grid1 = array2vega(1 - array_round)
arrays = [{'grid':grid0, 'variant': 'A'}, {'grid':grid1, 'variant': 'B'}]

pyperclip.copy(json.dumps(arrays))

And modified the Vega specification. This now looks as such (Vega-Editor):

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "width": 250,
  "height": 250,
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [{"grid": {"width": 125, "height": 125, "values": [0.78, 0.78, 0.78, 0.78, 0.78, 0.46, 0.53, 0.63, 0.63, 0.55]}, "variant": "A"}, {"grid": {"width": 125, "height": 125, "values": [0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.54, 0.47, 0.37, 0.37, 0.44999999999999996]}, "variant": "B"}]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [
        {
          "type": "heatmap",
          "field": "grid",
          "color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
          "opacity": 1
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "COLOR_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 1],
      "range": {"scheme": "viridis"}
    },
    {
      "name": "X_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 125],
      "range": "width"
    },
    {
      "name": "Y_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 125],
      "range": "height"
    }
  ],
  "axes": [
    {
      "scale": "Y_SCALE",
      "domain": false,
      "orient": "left",
      "offset": 2
    }
  ],
  "layout": {
    "columns": 2
  },
  "marks": [
    {
      "type": "group",
      "from": {
        "facet": {
          "name": "facet",
          "data": "GRID_IMAGE",
          "groupby": "variant"
        }
      },
      "title": {
        "text": {"signal": "parent.variant"}
      },
      "encode": {
        "update": {
          "width": {"signal": "width"},
          "height": {"signal": "height"}
        }
      },
      "axes": [
        {
          "scale": "X_SCALE",
          "domain": false,          
          "orient": "bottom"
        }
      ],
      "marks": [
        {
          "type": "image",
          "from": {"data": "facet"},
          "encode": {
            "update": {
              "x": {"value": 0},
              "y": {"value": 0},
              "image": {"field": "image"},
              "width": {"signal": "width"},
              "height": {"signal": "height"}
            }
          }
        }
      ]
    }
  ]
}

image

Not bad!

heatmap transform single array with non-zero x and y scale

Handling grids with custom scales, such as geographical data. This example showcases the challenges of aligning non-zero axes with grid dimensions and values.

This variant is still a bit difficult. The array is in unit degrees and goes on the x-axis from -180 to 180 longitude and on the y-axis from -81 to 87 latitude. The step-size is 1 degrees in both directions.

See Vega-Editor:

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "width": 360,
  "height": 168,
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [{
        "year":2016,
        "grid":{
          "x1_":-180,
          "x2_":180,
          "y1_":-81,
          "y2_":87,
          "height":168,
          "width":360,
          "values":[392,392,392,392,393,166,163,165,168,169]
        }
      }]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [
        {
          "type": "heatmap",
          "field": "grid",
          "color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
          "opacity": 1
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "COLOR_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 1],
      "range": {"scheme": "viridis"}
    },
    {
      "name": "X_SCALE",
      "type": "linear",
      "zero": false,
      "domain": [-180, 180],
      "range": "width"
    },
    {
      "name": "Y_SCALE",
      "type": "linear",
      "zero": false,
      "domain": [-81, 87],
      "range": "height"
    }
  ],
  "axes": [
    {
      "scale": "X_SCALE",
      "domain": false,
      "orient": "bottom"
    },
    {
      "scale": "Y_SCALE",
      "domain": false,
      "orient": "left",
      "titlePadding": 5,
      "offset": 2
    }
  ],
  "marks": [
    {
      "type": "image",
      "from": {"data": "GRID_IMAGE"},
      "encode": {
        "update": {
          "x": {"value": 0},
          "y": {"value": 0},
          "image": {"field": "image"},
          "width": {"signal": "datum.grid.width"},
          "height": {"signal": "datum.grid.height"}
        }
      }
    }
  ]
}

This results in:

image

Basically, for the grid only use the height and width to allocate the canvas size and iterate over the 1D array to colorize each pixel.
For the X_SCALE and Y_SCALE we use the information of x1/x2 and y1/y2 (still manually). We use the "datum.grid.width" and "datum.grid.height" as signal for within the image mark encoding. Since the scales also need a width and height, the global width/height are currently still set to the same witdth and height of the grid.

But if I change the grid input object to:

"x1":-180,
"x2":180,
"y1":-81,
"y2":87,
"height":168,
"width":360,

(removing the appended _ from x1/x2/y1/y2)
The result is this:

image

I've the feeling all negative values of our scales malfunction in the iterator within heatmap.js (here). But then it seems the drawn y-axis is reversed for the canvas iterator. If I add a "reverse":true to the scale Y_SCALE then it becomes more clear that only positive values are colorized in the canvas:

image

But then the latitude values on the y-axis does not match the input array.

heatmap transform double array with non-zero x and y scale

A more complex scenario with faceted charts using custom scales. This variant highlights the issues with global versus array-specific dimensions and independent color scales.

Lets make it a bit more complex. A facetted chart with non-zero x and y scales. Lets start with data preparation in python:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import pyperclip

import urllib.request
import json
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import pyperclip

# define data
source = 'https://raw.githubusercontent.com/vega/vega-datasets/main/data/annual-precip.json'
with urllib.request.urlopen(source) as url:
    data = json.load(url)
values = data['values']
width = data['width']  # 360
height = data['height']  # 168
extent = [-180, 180, -81, 87]  # xmin, xmax, ymin, ymax

# prepare array and plot
array = np.array(values).reshape(height, width)
plt.imshow(array, extent=extent)

image

def array2vega(array, extent):
    grid = {
        'extent': extent,
        'height': array.shape[0],
        'width': array.shape[1],
        'values': array.flatten(order='C').tolist()  # row-major
    }
    return grid

grid0 = array2vega(array, extent)
grid1 = array2vega(1 - array, extent)
arrays = [{'grid': grid0, 'variant': 'A'}, {'grid': grid1, 'variant': 'B'}]
df = pd.DataFrame.from_dict(arrays)

# copy and display
pyperclip.copy(df.to_json(orient='records'))
df

image

When prepararing a vega chart for this as such, See Vega-Editor:

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "width": 250,
  "height": 250,
  "data": [
    {
      "name": "GRID_ARRAY",
      "values": [
        {
          "grid": {
            "extent": [-180, 180, -81, 87],
            "height": 168,
            "width": 360,
            "values": [392, 392, 392, 169, 187, 196]
          },
          "variant": "A"
        },
        {
          "grid": {
            "extent": [-180, 180, -81, 87],
            "height": 168,
            "width": 360,
            "values": [-391, -391, -391, -164, -167, -168]
          },
          "variant": "B"
        }
      ]
    },
    {
      "name": "GRID_IMAGE",
      "source": "GRID_ARRAY",
      "transform": [
        {
          "type": "heatmap",
          "field": "grid",
          "color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
          "opacity": 1
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "COLOR_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [0, 1],
      "range": {"scheme": "viridis"}
    },
    {
      "name": "X_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [-180, 180],
      "range": "width"
    },
    {
      "name": "Y_SCALE",
      "type": "linear",
      "zero": true,
      "domain": [-81, 87],
      "range": "height"
    }
  ],
  "axes": [
    {"scale": "Y_SCALE", "domain": false, "orient": "left", "offset": 2}
  ],
  "layout": {"columns": 2},
  "marks": [
    {
      "type": "group",
      "from": {
        "facet": {"name": "facet", "data": "GRID_IMAGE", "groupby": "variant"}
      },
      "title": {"text": {"signal": "parent.variant"}},
      "encode": {
        "update": {"width": {"signal": "width"}, "height": {"signal": "height"}}
      },
      "axes": [{"scale": "X_SCALE", "domain": false, "orient": "bottom"}],
      "marks": [
        {
          "type": "image",
          "from": {"data": "facet"},
          "encode": {
            "update": {
              "x": {"value": 0},
              "y": {"value": 0},
              "image": {"field": "image"},
              "width": {"signal": "datum.grid.width"},
              "height": {"signal": "datum.grid.height"}
            }
          }
        }
      ]
    }
  ]
}

image

Two issues become clear from this:

  • We see the interference of a global-defined width and height and the array-defined grid.width and grid.height.
  • Another issue that becomes apparent is that currently the color scale is not applied independent.

Proposed Specification

This is already more discussed within #6043, but something as such should be sufficient for many things (notice there is no need for an x and y encoding channel, as the 2D array data comes prepared).

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "values": {
          "grid": {
            "extent": [-180, 180, -81, 87],
            "height": 168,
            "width": 360,
            "values": [392, 392, 392, 169, 187, 196]
          },
          "variant": "A"
        },
  },
  "mark": "array",
  "encoding": {
    "color": {"scale": {"scheme": "viridis"}},
    "row": {},
    "column": {}
  }
}

With a new array mark it is hoped we can simplify syntax to specify array data, simultaneously still support handling of color schemes, with options for customization including integration with Vega-Lite's axis and scale system, supporting both zero and non-zero scales.
More over it is shown that faceting of multiple arrays is a real possibility even though maintaining independent scales and axes is something to be explored more deeply.

Performance optimization has not been part of this exploration, but it is to be noted that it would be great if the result of a heatmap transform, a canvas image, can be included within the JSON specification, meaning that the application of the heatmap transform can be done server-side. Currently it is unclear if this is accepted within the JSON standard.

This issue is one of the results of a spontaneous attempt to bring vega/altair#891 further. Thanks for brainstorming on this topic @kanitw, @timtreis, @melonora and @joelostblom!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    • Status

      Gridded data support

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions