Support 2D density visualizations

Hi all,

I would really like to expand Vega-Lite with support for 2D density representations, leveraging Vega's `kde2D`, `isocontour`, and `heatmap` transforms. Adding these would make Vega-Lite largely "feature complete" for my current visualization teaching needs, and make it more comparable to other popular plotting libraries. However, it is not clear how to best do this.

I see two general approaches:

1. Use existing mark types only (`geoshape` for contours, `image` for heatmaps) and leverage new transforms (`density2d`, `contour`, and `heatmap`) to generate the appropriate input data to the mark. This is the same approach we've followed so far when adding transforms such as `density` and `regression`. However, Vega's `kde2d` and `heatmap` transforms both interleave data-space and encoding-space concerns in ways that make this difficult.
  For example, the `kde2d` transform takes `x` and `y` field accessors that must return **pixel-space** values. Typically this is done using an expression that maps an underlying data field through a defined scale transform. So, for this to work in Vega-Lite we need a way to generate / access appropriate scale transforms, introducing a cross-cutting concern. Moreover, we'd like Vega-Lite to also use those scales to add appropriate axes, so we'd also have error-prone redundancy in the specification if we need to provide x/y fields in both the transform and encoding. (This is further complicated by the fact that `geoshape` doesn't take `x` and `y` encodings anyway...)
  The `heatmap` transform has a separate issue, which is that it accepts expressions for determining pixel color and opacity. While these can be stand-alone, most times we actually want to use a defined color scale (and corresponding legend), again mixing transform and encoding concerns.

2. An alternative is to create new mark types, such as `contour` and `heatmap`. Ideally these marks could accept either pre-calculated raster grid data (from which contours / heatmaps can be directly generated) or point data (to which the `kde2d` transform would be applied). The Vega-Lite compiler would need to generate appropriate transforms and encodings. I imagine transform parameters could be passed as mark properties. 
  There are still some limitations to this approach: in Vega we can separately generate heatmap images and then use them as input to image marks, such that we could in theory do things like create an entire scatter plot where each point is a small density heatmap. However, I don't think this extra expressiveness is critical for Vega-Lite.
  The biggest hurdle to this approach is that I have no idea how to implement it in the current VL compiler. So I can't estimate the feasibility or difficulty. That said, I would be happy to collaborate with someone more knowledgable.

While neither solution is completely satisfactory, I'm leaning towards the approach of adding new mark types. I think the interleaving of data-space and encoding-space operations in VL transforms breaks too much, in terms of both output and user mental models.

Any thoughts or feedback, particularly relative to the feasibility of option 2?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support 2D density visualizations #6043

jheer
openedon Mar 9, 2020

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support 2D density visualizations #6043

Description

jheeropenedon Mar 9, 2020

Metadata