Description
openedon Jul 12, 2024
This feature request proposes the addition of a new array
mark to Vega-Lite.
This mark aims to improve support for the visualization of various types of 2D data, including heatmaps, image data, and other matrix-based representations, with built-in support for color scales, axis labels, and faceting. I see this is an initial step towards #6043, as this focus on just a single transform in Vega, but many issues discussed in that issue also apply to this issue.
The following variants are an exploration on how the heatmap transform within Vega behaves, and how data can be prepared for ingestion within the specification. This is an initial attempt that can hopefully serve as a starting point to explore this field a bit more with the hope that someone is brave enough to turn this into an attempt for a PR.
variants explored so far
- heatmap transform single array only
- heatmap transform with color scale
- heatmap transform with color scale and axis
- heatmap transform double array faceted with color scale and axis
- heatmap transform single array with non-zero x and y scale
- heatmap transform double array with non-zero x and y scale
Note: in the specs below, I've reduced the length of the grid values. In the accompanying Vega-Editor links all values of the grids are included.
heatmap transform values only
A basic implementation using numpy to generate a heatmap from a single array, displaying it with Vega. The image is rendered with opacity levels only.
import numpy as np
import matplotlib.pyplot as plt
from skimage import data
from skimage.transform import rescale
import pyperclip
array = data.camera()
array_small = rescale(array, 0.245, anti_aliasing=False)
array_round = (array_small * 255).astype(np.uint8)
plt.imshow(array_round, cmap='gray')
print('shape', array_round.shape)
array_as_flatlist = array_round.flatten(order='C').tolist() # row-major
print('head', array_as_flatlist[0:5])
pyperclip.copy(str(array_as_flatlist))
We can make it work using the heatmap
transform in Vega, using the following specification (Vega-Editor):
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"data": [
{
"name": "GRID_ARRAY",
"values": [
{
"width": 125,
"height": 125,
"values": [199, 200, 200, 198, 198, 118, 135, 161, 161, 140]
}
]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [{"type": "heatmap"}]
}
],
"marks": [
{
"type": "image",
"from": {"data": "GRID_IMAGE"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "datum.width"},
"height": {"signal": "datum.height"}
}
}
}
]
}
The result looks like this:
It seems this is the image drawn with opacity levels only.
heatmap transform with color scale
Adding a color scale to the heatmap to enhance visual differentiation of values. This example replicates a grayscale image using Vega's color scale functionality.
Let's add a color scale (Vega-Editor):
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"data": [
{
"name": "GRID_ARRAY",
"values": [
{
"width": 125,
"height": 125,
"values": [199, 200, 200, 198, 198, 130, 118, 135, 161, 161, 140]
}
]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [
{
"type": "heatmap",
"color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
"opacity": 1
}
]
}
],
"scales": [
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "viridis"}
}
],
"marks": [
{
"type": "image",
"from": {"data": "GRID_IMAGE"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "datum.width"},
"height": {"signal": "datum.height"}
}
}
}
]
}
The result will look like this:
Using this approach, I also can reproduce the grayscale image like in python using plt.imshow()
.
By modifying the color scale as such (Vega-Editor):
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "greys"},
"reverse": true
}
heatmap transform with color scale and axis
Enhancing the previous example by including axis labels, providing context to the grid values. This facilitates interpretation of the data.
Next step is to add axis to the image.
The Vega specification now looks as such (Vega-Editor):
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 250,
"height": 250,
"data": [
{
"name": "GRID_ARRAY",
"values": [
{
"width": 125,
"height": 125,
"values": [199, 200, 200, 198, 198, 118, 135, 161, 161, 140]
}
]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [
{
"type": "heatmap",
"color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
"opacity": 1
}
]
}
],
"scales": [
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "viridis"}
},
{
"name": "X_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 125],
"range": "width"
},
{
"name": "Y_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 125],
"range": "height"
}
],
"axes": [
{
"scale": "X_SCALE",
"domain": false,
"orient": "bottom",
"tickCount": 5,
"labelFlush": true
},
{
"scale": "Y_SCALE",
"domain": false,
"orient": "left",
"titlePadding": 5,
"offset": 2
}
],
"marks": [
{
"type": "image",
"from": {"data": "GRID_IMAGE"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "width"},
"height": {"signal": "height"}
}
}
}
]
}
So far so good.
heatmap transform double array faceted with color scale and axis
Faceting multiple grids within a single visualization. This example demonstrates handling of two separate arrays with independent color scales and axis labels.
Are we able to facet grids, if we have for example two grids as input?
I've adapted my python code to prepare the data arrays:
import numpy as np
from skimage import data
from skimage import color
from skimage.transform import rescale
import pyperclip
import json
def array2vega(array):
grid = {
'height': array.shape[0],
'width': array.shape[1],
'values': array.flatten(order='C').tolist() # row-major
}
return grid
array = data.camera()
array_small = rescale(array, 0.245, anti_aliasing=False)
array_round = np.round(array_small, 2)
grid0 = array2vega(array_round)
grid1 = array2vega(1 - array_round)
arrays = [{'grid':grid0, 'variant': 'A'}, {'grid':grid1, 'variant': 'B'}]
pyperclip.copy(json.dumps(arrays))
And modified the Vega specification. This now looks as such (Vega-Editor):
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 250,
"height": 250,
"data": [
{
"name": "GRID_ARRAY",
"values": [{"grid": {"width": 125, "height": 125, "values": [0.78, 0.78, 0.78, 0.78, 0.78, 0.46, 0.53, 0.63, 0.63, 0.55]}, "variant": "A"}, {"grid": {"width": 125, "height": 125, "values": [0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.54, 0.47, 0.37, 0.37, 0.44999999999999996]}, "variant": "B"}]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [
{
"type": "heatmap",
"field": "grid",
"color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
"opacity": 1
}
]
}
],
"scales": [
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "viridis"}
},
{
"name": "X_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 125],
"range": "width"
},
{
"name": "Y_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 125],
"range": "height"
}
],
"axes": [
{
"scale": "Y_SCALE",
"domain": false,
"orient": "left",
"offset": 2
}
],
"layout": {
"columns": 2
},
"marks": [
{
"type": "group",
"from": {
"facet": {
"name": "facet",
"data": "GRID_IMAGE",
"groupby": "variant"
}
},
"title": {
"text": {"signal": "parent.variant"}
},
"encode": {
"update": {
"width": {"signal": "width"},
"height": {"signal": "height"}
}
},
"axes": [
{
"scale": "X_SCALE",
"domain": false,
"orient": "bottom"
}
],
"marks": [
{
"type": "image",
"from": {"data": "facet"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "width"},
"height": {"signal": "height"}
}
}
}
]
}
]
}
Not bad!
heatmap transform single array with non-zero x and y scale
Handling grids with custom scales, such as geographical data. This example showcases the challenges of aligning non-zero axes with grid dimensions and values.
This variant is still a bit difficult. The array is in unit degrees and goes on the x-axis from -180
to 180
longitude and on the y-axis from -81
to 87
latitude. The step-size is 1 degrees in both directions.
See Vega-Editor:
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 360,
"height": 168,
"data": [
{
"name": "GRID_ARRAY",
"values": [{
"year":2016,
"grid":{
"x1_":-180,
"x2_":180,
"y1_":-81,
"y2_":87,
"height":168,
"width":360,
"values":[392,392,392,392,393,166,163,165,168,169]
}
}]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [
{
"type": "heatmap",
"field": "grid",
"color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
"opacity": 1
}
]
}
],
"scales": [
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "viridis"}
},
{
"name": "X_SCALE",
"type": "linear",
"zero": false,
"domain": [-180, 180],
"range": "width"
},
{
"name": "Y_SCALE",
"type": "linear",
"zero": false,
"domain": [-81, 87],
"range": "height"
}
],
"axes": [
{
"scale": "X_SCALE",
"domain": false,
"orient": "bottom"
},
{
"scale": "Y_SCALE",
"domain": false,
"orient": "left",
"titlePadding": 5,
"offset": 2
}
],
"marks": [
{
"type": "image",
"from": {"data": "GRID_IMAGE"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "datum.grid.width"},
"height": {"signal": "datum.grid.height"}
}
}
}
]
}
This results in:
Basically, for the grid only use the height
and width
to allocate the canvas size and iterate over the 1D array to colorize each pixel.
For the X_SCALE
and Y_SCALE
we use the information of x1
/x2
and y1
/y2
(still manually). We use the "datum.grid.width"
and "datum.grid.height"
as signal
for within the image mark encoding. Since the scales also need a width and height, the global width
/height
are currently still set to the same witdth and height of the grid.
But if I change the grid input object to:
"x1":-180,
"x2":180,
"y1":-81,
"y2":87,
"height":168,
"width":360,
(removing the appended _
from x1
/x2
/y1
/y2
)
The result is this:
I've the feeling all negative values of our scales malfunction in the iterator within heatmap.js (here). But then it seems the drawn y-axis is reversed for the canvas iterator. If I add a "reverse":true
to the scale Y_SCALE
then it becomes more clear that only positive values are colorized in the canvas:
But then the latitude values on the y-axis does not match the input array.
heatmap transform double array with non-zero x and y scale
A more complex scenario with faceted charts using custom scales. This variant highlights the issues with global versus array-specific dimensions and independent color scales.
Lets make it a bit more complex. A facetted chart with non-zero x and y scales. Lets start with data preparation in python:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import pyperclip
import urllib.request
import json
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import pyperclip
# define data
source = 'https://raw.githubusercontent.com/vega/vega-datasets/main/data/annual-precip.json'
with urllib.request.urlopen(source) as url:
data = json.load(url)
values = data['values']
width = data['width'] # 360
height = data['height'] # 168
extent = [-180, 180, -81, 87] # xmin, xmax, ymin, ymax
# prepare array and plot
array = np.array(values).reshape(height, width)
plt.imshow(array, extent=extent)
def array2vega(array, extent):
grid = {
'extent': extent,
'height': array.shape[0],
'width': array.shape[1],
'values': array.flatten(order='C').tolist() # row-major
}
return grid
grid0 = array2vega(array, extent)
grid1 = array2vega(1 - array, extent)
arrays = [{'grid': grid0, 'variant': 'A'}, {'grid': grid1, 'variant': 'B'}]
df = pd.DataFrame.from_dict(arrays)
# copy and display
pyperclip.copy(df.to_json(orient='records'))
df
When prepararing a vega chart for this as such, See Vega-Editor:
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 250,
"height": 250,
"data": [
{
"name": "GRID_ARRAY",
"values": [
{
"grid": {
"extent": [-180, 180, -81, 87],
"height": 168,
"width": 360,
"values": [392, 392, 392, 169, 187, 196]
},
"variant": "A"
},
{
"grid": {
"extent": [-180, 180, -81, 87],
"height": 168,
"width": 360,
"values": [-391, -391, -391, -164, -167, -168]
},
"variant": "B"
}
]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [
{
"type": "heatmap",
"field": "grid",
"color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
"opacity": 1
}
]
}
],
"scales": [
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "viridis"}
},
{
"name": "X_SCALE",
"type": "linear",
"zero": true,
"domain": [-180, 180],
"range": "width"
},
{
"name": "Y_SCALE",
"type": "linear",
"zero": true,
"domain": [-81, 87],
"range": "height"
}
],
"axes": [
{"scale": "Y_SCALE", "domain": false, "orient": "left", "offset": 2}
],
"layout": {"columns": 2},
"marks": [
{
"type": "group",
"from": {
"facet": {"name": "facet", "data": "GRID_IMAGE", "groupby": "variant"}
},
"title": {"text": {"signal": "parent.variant"}},
"encode": {
"update": {"width": {"signal": "width"}, "height": {"signal": "height"}}
},
"axes": [{"scale": "X_SCALE", "domain": false, "orient": "bottom"}],
"marks": [
{
"type": "image",
"from": {"data": "facet"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "datum.grid.width"},
"height": {"signal": "datum.grid.height"}
}
}
}
]
}
]
}
Two issues become clear from this:
- We see the interference of a global-defined
width
andheight
and the array-definedgrid.width
andgrid.height
. - Another issue that becomes apparent is that currently the color scale is not applied independent.
Proposed Specification
This is already more discussed within #6043, but something as such should be sufficient for many things (notice there is no need for an x
and y
encoding channel, as the 2D array data comes prepared).
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": {
"grid": {
"extent": [-180, 180, -81, 87],
"height": 168,
"width": 360,
"values": [392, 392, 392, 169, 187, 196]
},
"variant": "A"
},
},
"mark": "array",
"encoding": {
"color": {"scale": {"scheme": "viridis"}},
"row": {},
"column": {}
}
}
With a new array
mark it is hoped we can simplify syntax to specify array data, simultaneously still support handling of color schemes, with options for customization including integration with Vega-Lite's axis and scale system, supporting both zero and non-zero scales.
More over it is shown that faceting of multiple arrays is a real possibility even though maintaining independent scales and axes is something to be explored more deeply.
Performance optimization has not been part of this exploration, but it is to be noted that it would be great if the result of a heatmap transform, a canvas image, can be included within the JSON specification, meaning that the application of the heatmap transform can be done server-side. Currently it is unclear if this is accepted within the JSON standard.
This issue is one of the results of a spontaneous attempt to bring vega/altair#891 further. Thanks for brainstorming on this topic @kanitw, @timtreis, @melonora and @joelostblom!
Metadata
Assignees
Type
Projects
Status
Gridded data support